A question about the loss in t2t

xinyu1205 / recognize-anything

Open-source and strong foundation image recognition models.

Apache License 2.0

2.93k stars 278 forks source link

https://github.com/xinyu1205/recognize-anything/blob/fd2ab877e245e8e571af7b2d6048d1a9d40a6408/ram/models/tag2text.py#L226 I think the result of this loss is always equal to 2*loss_t2t, which will cause the result to be irrelevant to the loss_tag. What am I doing wrong? I did think that since “(loss_tag/loss_t2t).detach()” doesn’t make sense in backward, it can be regarded as a constant, but this constant is not fixed, but can be changed, which leads to the result that is essentially only related to loss_t2t . Is this what we want? It seems that you want to realize the idea of dynamically changing the two loss scales, but is it really possible to do it in this way? Hope to get your reply, this is so confusing to me.

xinyu1205 / recognize-anything

A question about the loss in t2t #72