focal scaling v.s. original focal loss

microsoft / RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Apache License 2.0

712 stars 52 forks source link

focal scaling v.s. original focal loss #86

Closed Cogito2012 closed 1 year ago

Cogito2012 commented 1 year ago

Hi, Thanks for sharing the great repo! Could you explain what are the differences between the focal scaling and the original focal loss?

From the implementation here: ./detectron2/modeling/roi_heads/fast_rcnn.py#L630, seems like it is a special case of focal loss that alpha=1 and gamma=0.5.

Besides, why is the focal scaling effective in mitigating forgetting of concepts learned pertaining stage that contains a small number of categories?

Looking forward to your reply! Thank you!

YiwuZhong commented 1 year ago

Hi @Cogito2012, the focal scaling in our paper extends the key idea of original focal loss, that is, suppressing the confident predictions. In our transfer learning (using only base categories), the models tend to bias to those base categories, making their confidence scores high. Focal scaling makes our model less biased to these training samples and thus reduces the forgetting.