Something went wrong when I "Training with multiple seen domains"

7749546 commented 3 years ago

The loss is nan. I tried to reduce the leraning rate. But it didn't work. Could you please give me some advice?

hytseng0509 commented 3 years ago

Which pytorch version do you use?

7749546 commented 3 years ago

Thank you for your reply. My torch version is 1.2.0. But I can run this command：python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.

The result seems to be ok. The result is shown below：

7749546 commented 3 years ago

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

ContestantsD commented 2 years ago

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too

04556338896 commented 2 years ago

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

ContestantsD commented 2 years ago

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

04556338896 commented 2 years ago

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

我也是30系显卡用cuda10.2，前几个epoch有loss，后面就都是nan了，protonet和relationnet都是这个情况，请问是否记得修改的哪里的模型参数，谢谢！

ContestantsD commented 2 years ago

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

我也是30系显卡用cuda10.2，前几个epoch有loss，后面就都是nan了，protonet和relationnet都是这个情况，请问是否记得修改的哪里的模型参数，谢谢！

torch换成cuda11的，然后run一下，根据报错调吧，具体是哪儿我忘了，反正最后会卡在一个导入模型的地方，在那里调一下位置就好

04556338896 commented 2 years ago

Which pytorch version do you use?

So, from the result I got above，should I change the torch version to above 1.3.0？If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题，我这边出现的这个问题是因为我刻意使用cuda10.2才出现的，然而我用的30系显卡不支持cuda10，所以会nan，具体解决好像是通过修改导入模型参数的次序完成的，他的代码写的不太严谨

我也是30系显卡用cuda10.2，前几个epoch有loss，后面就都是nan了，protonet和relationnet都是这个情况，请问是否记得修改的哪里的模型参数，谢谢！

torch换成cuda11的，然后run一下，根据报错调吧，具体是哪儿我忘了，反正最后会卡在一个导入模型的地方，在那里调一下位置就好

谢谢！

sx1999 commented 1 year ago

Thank you for your reply. My torch version is 1.2.0. But I can run this command：python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.

The result seems to be ok. The result is shown below：

hello, the dataset links in the code are invalid, could you please provide me with your datasets? Thanks!

hytseng0509 / CrossDomainFewShot

Something went wrong when I "Training with multiple seen domains" #21