xinyu1205 / recognize-anything

Open-source and strong foundation image recognition models.
https://recognize-anything.github.io/
Apache License 2.0
2.93k stars 278 forks source link

A question about the loss in t2t #72

Open SKBL5694 opened 1 year ago

SKBL5694 commented 1 year ago

https://github.com/xinyu1205/recognize-anything/blob/fd2ab877e245e8e571af7b2d6048d1a9d40a6408/ram/models/tag2text.py#L226 I think the result of this loss is always equal to 2*loss_t2t, which will cause the result to be irrelevant to the loss_tag. What am I doing wrong? I did think that since “(loss_tag/loss_t2t).detach()” doesn’t make sense in backward, it can be regarded as a constant, but this constant is not fixed, but can be changed, which leads to the result that is essentially only related to loss_t2t . Is this what we want? It seems that you want to realize the idea of dynamically changing the two loss scales, but is it really possible to do it in this way? Hope to get your reply, this is so confusing to me.

xinyu1205 commented 1 year ago

Actually, this is a training trick I personally often adopt to balance two task loss. In my very early comparative experiments, this trick showed a slight improvement. For more details, I recommend you can reference this answer: 深度学习的多个loss如何平衡? - hzwer的回答 - 知乎 https://www.zhihu.com/question/375794498/answer/2292320194