Open 7749546 opened 3 years ago
The loss is nan. I tried to reduce the leraning rate. But it didn't work. Could you please give me some advice?
Which pytorch version do you use?
Thank you for your reply. My torch version is 1.2.0. But I can run this command:python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.
The result seems to be ok. The result is shown below:
Which pytorch version do you use?
So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?
Which pytorch version do you use?
So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?
are you resolve this problem? i am facing this problem too
Which pytorch version do you use?
So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?
are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.
Which pytorch version do you use?
So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?
are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.
我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨
Which pytorch version do you use?
So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?
are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.
我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨
我也是30系显卡用cuda10.2,前几个epoch有loss,后面就都是nan了,protonet和relationnet都是这个情况,请问是否记得修改的哪里的模型参数,谢谢!
Which pytorch version do you use?
So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?
are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.
我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨
我也是30系显卡用cuda10.2,前几个epoch有loss,后面就都是nan了,protonet和relationnet都是这个情况,请问是否记得修改的哪里的模型参数,谢谢!
torch换成cuda11的,然后run一下,根据报错调吧,具体是哪儿我忘了,反正最后会卡在一个导入模型的地方,在那里调一下位置就好
Which pytorch version do you use?
So, from the result I got above,should I change the torch version to above 1.3.0?If I have encountered the initial problem after changing the selected model, should I consider upgrading the torch version?
are you resolve this problem? i am facing this problem too Hi, did you solve the problem? I train the model on my own dataset, but my loss is nan too.
我解决了这个问题,我这边出现的这个问题是因为我刻意使用cuda10.2才出现的,然而我用的30系显卡不支持cuda10,所以会nan,具体解决好像是通过修改导入模型参数的次序完成的,他的代码写的不太严谨
我也是30系显卡用cuda10.2,前几个epoch有loss,后面就都是nan了,protonet和relationnet都是这个情况,请问是否记得修改的哪里的模型参数,谢谢!
torch换成cuda11的,然后run一下,根据报错调吧,具体是哪儿我忘了,反正最后会卡在一个导入模型的地方,在那里调一下位置就好
谢谢!
Thank you for your reply. My torch version is 1.2.0. But I can run this command:python3 train_baseline.py --method relationnet_softmax --dataset multi --testset cars --name multi_cars_ori_relationnet_softmax --warmup baseline --train_aug.
The result seems to be ok. The result is shown below:
hello, the dataset links in the code are invalid, could you please provide me with your datasets? Thanks!