VSE_INF model cannot train

AOPALUNPA commented 2 months ago

Hello, I am trying to reimplement the VSE_INF method in my own code structure. I copy the model, loss and dataloader part, but I cannot let the training procedure run like your original code. The loss will be stuck at some point when I unfreeze the backbone of img_enc(e.g. 51.706543 51.586914 51.485352 51.493164.....) and the model will collapse. I cannot figure out what's wrong with my reimplementation. Do you know what's the reason for this issue? Thanks a lot!

SanghyukChun commented 2 months ago

Hi, this code repository is for PCME++, not VSE infinity. Also, if you use your own code structure, I don't have any idea to your problem. I have a gut feeling that the problem could happen due to the hardest negative mining (HNM). I often observed that the HNM triplet loss does not decrease without a carefully chosen batch size, backbone freeze/unfreeze strategy, learning rate warmup schedule, and learning rate.

Since the problem is irrelevant to PCME++, I closed the issue.

AOPALUNPA commented 2 months ago

Thanks a lot for your reply! I'll check the code carefully! Many thanks!

Best, Joe

Sanghyuk Chun @.***> 于2024年7月5日周五 12:22写道：

Hi, this code repository is for PCME++, not VSE infinity. Also, if you use your own code structure, I don't have any idea to your problem. I have a gut feeling that the problem could happen due to the hardest negative mining (HNM). I often observed that the HNM triplet loss does not decrease without a carefully chosen batch size, backbone freeze/unfreeze strategy, learning rate warmup schedule, and learning rate.

Since the problem is irrelevant to PCME++, I closed the issue.

— Reply to this email directly, view it on GitHub https://github.com/naver-ai/pcmepp/issues/7#issuecomment-2210113047, or unsubscribe https://github.com/notifications/unsubscribe-auth/BJQ6Q76KBLFQLB55EKFRWEDZKYNP5AVCNFSM6AAAAABKK4OWKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGEYTGMBUG4 . You are receiving this because you authored the thread.Message ID: @.***>

naver-ai / pcmepp

VSE_INF model cannot train #7