Open knightyxp opened 3 years ago
Hi, I may not be able to tell the problem by that. Could you briefly tell me about the dataset and what changes you made?
The dataset belongs to the company's business and is temporarily unavailableļ¼the total amount is 1500w and 200 category, i have ever train simsiam on the v100 or severl 1080 ti machines, all failed in the downstream task. i think is the problem of lr (moco train on the same dataset work on downstream task), so according to the code https://docs.lightly.ai/tutorials/package/tutorial_simsiam_esa.html#setup-data-augmentations-and-loaders, i add collapse level to see whether model is collapsed. still now, the colllapse still happen when the lr is 0.5=0.1*5 (5machines 1080ti 8card) collapse level is 0.67/1 loss is -0.68, i really need a strategy abou how to find a suitable lr for our dataset, my wechat is knightyxp and qq is 377525381 , looking forward for Dr.tao's suggestions
Hi, this problem maybe out of my scope. If you think lr is the problem, you may start from a large lr (0.5 in your case) and gradually decrease it. Another thing is that loss=-0.68 doesn't seem to be collapse, it seems that the model didn't learn representations well.
firstly,it is a great job to do ddp(distributed_data_paralel) train on huge dataset(other code is on cifar10), i could get the result you published. however ,when i train the simsiam on other dataset(not published) the simsiam model tend to collapse, i wonder whether it is the problem of not suitable lr or not including the procedure of warm up, i really hope Dr.tao could share your suggestions.