collapse on other dataset

knightyxp commented 3 years ago

firstly,it is a great job to do ddp(distributed_data_paralel) train on huge dataset(other code is on cifar10), i could get the result you published. however ,when i train the simsiam on other dataset(not published) the simsiam model tend to collapse, i wonder whether it is the problem of not suitable lr or not including the procedure of warm up, i really hope Dr.tao could share your suggestions.

taoyang1122 commented 3 years ago

Hi, I may not be able to tell the problem by that. Could you briefly tell me about the dataset and what changes you made?

knightyxp commented 3 years ago

The dataset belongs to the company's business and is temporarily unavailable，the total amount is 1500w and 200 category, i have ever train simsiam on the v100 or severl 1080 ti machines, all failed in the downstream task. i think is the problem of lr (moco train on the same dataset work on downstream task), so according to the code https://docs.lightly.ai/tutorials/package/tutorial_simsiam_esa.html#setup-data-augmentations-and-loaders, i add collapse level to see whether model is collapsed. still now, the colllapse still happen when the lr is 0.5=0.1*5 (5machines 1080ti 8card) collapse level is 0.67/1 loss is -0.68, i really need a strategy abou how to find a suitable lr for our dataset, my wechat is knightyxp and qq is 377525381 , looking forward for Dr.tao's suggestions

taoyang1122 commented 3 years ago

Hi, this problem maybe out of my scope. If you think lr is the problem, you may start from a large lr (0.5 in your case) and gradually decrease it. Another thing is that loss=-0.68 doesn't seem to be collapse, it seems that the model didn't learn representations well.

taoyang1122 / pytorch-SimSiam

collapse on other dataset #6