Open lucasjinreal opened 5 years ago
This happens to me , too . the version of Pytorch is 0.4.1 . `100%|█████████████████████████████████████████████████████████████████████████████████| 423/423 [09:39<00:00, 1.34s/it] [train] Epoch: 100/100 Loss: nan Acc: 0.010874704491725768 Execution time: 579.1260393778794
100%|█████████████████████████████████████████████████████████████████████████████████| 108/108 [01:02<00:00, 2.30it/s] [val] Epoch: 100/100 Loss: nan Acc: 0.0111162575266327 Execution time: 62.677289011888206
Save model at /media/ext/lizhongguo/ActionRecognition/pytorch-video-recognition/run/run_1/models/C3D-ucf101_epoch-99.pth.tar
100%|█████████████████████████████████████████████████████████████████████████████████| 136/136 [01:16<00:00, 3.15it/s] [test] Epoch: 100/100 Loss: nan Acc: 0.010736764161421697 Execution time: 76.43733210070059 `
Hi, you may reduce the learning rate.
i also suffered from Loss:Nan.. I reduce learning rate from 1e-3 to 1e-1, but results is same(Loss : nan).
If Loss is nan, then cannot store weights. so model cant increase accuracy.... Anybody solved this problem?
I checked the code from https://github.com/facebookresearch/VMZ/blob/master/lib/models/c3d_model.py , and added BatchNorm layer between Conv layer and Relu layer . Now it seems working on UCF-101 dataset .
@lizhongguo let me have a look
i also suffered from Loss:Nan.. I reduce learning rate from 1e-3 to 1e-1, but results is same(Loss : nan).
If Loss is nan, then cannot store weights. so model cant increase accuracy.... Anybody solved this problem?
Reducing learning rate means selecting a rate lower than 1e-3, such as 1e-5 or 0.5e-3. Personally I trained the model from scratch on UCF101 with learning rate equal to 1e-3, without having any NaN issues.
@wave-transmitter Thank you for comment ! i solved this problem using learning rate. i reduced learning rate to 1e-5, then it worked correctly !
however, when i reduce Learning rate, the acc is just 0.20, what should i do
@ilovekj i recommend to find your proper learning rate ! i control to several times, and found proper rate. how about augment your dataset ?
@makeastir but there is another question, it seems that they are splitting the dataset randomly, which is not allowed, there are three official splits, and when I use this code, it performance poor
@ilovekj i also used this code and i got efficient performance. In this code has augmentation module so that this code should make dataset more useful. how about increase to your dataset quantity ? In my case, Non-True is 400 , True is 150. Or reduce to features of dataset ?
@makeastir but you didn't use the official splits
@ilovekj Hi. I used official split and corresponding dataloader and I only got 1% accuracy. But the same code on the random split is 98%. I wonder did you figure out the problem?
maybe we didn't use pretrain model, but i am not sure
I got some loss like this:
It;s all nan, for what reason maybe?