hytseng0509 / CrossDomainFewShot

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation (ICLR 2020 spotlight)
323 stars 62 forks source link

Something went wrong when I "Training with multiple seen domains" #21

Open 7749546 opened 3 years ago

7749546 commented 3 years ago

图片 The loss is nan. I tried to reduce the leraning rate. But it didn't work. Could you please give me some advice? 图片

hytseng0509 commented 3 years ago

Which pytorch version do you use?

7749546 commented 3 years ago

Thank you for your reply. My torch version is 1.2.0. But I can run this command:python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.

The result seems to be ok. The result is shown below: 图片

7749546 commented 3 years ago

Which pytorch version do you use?

So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

ContestantsD commented 2 years ago

Which pytorch version do you use?

So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too

04556338896 commented 2 years ago

Which pytorch version do you use?

So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

ContestantsD commented 2 years ago

Which pytorch version do you use?

So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨

04556338896 commented 2 years ago

Which pytorch version do you use?

So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨

我也是30系显卡用cuda10.2,前几个epoch有loss,后面就都是nan了,protonet和relationnet都是这个情况,请问是否记得修改的哪里的模型参数,谢谢!

ContestantsD commented 2 years ago

Which pytorch version do you use?

So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨

我也是30系显卡用cuda10.2,前几个epoch有loss,后面就都是nan了,protonet和relationnet都是这个情况,请问是否记得修改的哪里的模型参数,谢谢!

torch换成cuda11的,然后run一下,根据报错调吧,具体是哪儿我忘了,反正最后会卡在一个导入模型的地方,在那里调一下位置就好

04556338896 commented 2 years ago

Which pytorch version do you use?

So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?

are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.

我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨

我也是30系显卡用cuda10.2,前几个epoch有loss,后面就都是nan了,protonet和relationnet都是这个情况,请问是否记得修改的哪里的模型参数,谢谢!

torch换成cuda11的,然后run一下,根据报错调吧,具体是哪儿我忘了,反正最后会卡在一个导入模型的地方,在那里调一下位置就好

谢谢!

sx1999 commented 1 year ago

Thank you for your reply. My torch version is 1.2.0. But I can run this command:python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.

The result seems to be ok. The result is shown below: 图片

hello, the dataset links in the code are invalid, could you please provide me with your datasets? Thanks!