VideoDataset has a bug about data normalization

jfzhang95 / pytorch-video-recognition

PyTorch implemented C3D, R3D, R2Plus1D models for video activity recognition.

MIT License

1.19k stars 252 forks source link

VideoDataset has a bug about data normalization #23

Open leftthomas opened 5 years ago

leftthomas commented 5 years ago

There is a bug about normalize(self, buffer) function in dataset.py, it has not normalize data to [0, 1], which we usually do this in Deep Learning training process with PyTorch. And I also tested it, if we don't normalize it, the training process was totally failed when I used the official train/test split of UCF101, after 54 epochs, the testing accuracy was only around 5%. And if we normalize it, the training process was fine, after 5 epochs, it obtained 8.2% testing accuracy. https://github.com/jfzhang95/pytorch-video-recognition/blob/ca37de9f69a961f22a821c157e9ccf47a601904d/dataloaders/dataset.py#L204

wave-transmitter commented 5 years ago

Why do you think this is a bug? Normalizing data to [0, 1] is not always the case. Subtracting the mean RGB values of the used dataset(usually ImageNet) for backbone's pre-training is also common. Function normalize() follows this approach.

If you want to prove that normalizing data to [0, 1] leads to higher performance, you have to elaborate more on this. The results that you provided are not comparable to each other. You could validate this by training you model while applying each time one of the two normalization approaches and report the results for the same number of epochs.

leftthomas commented 5 years ago

@wave-transmitter The common solution is that Normalization should be done after the data have been scaled to [0,1], we usually call the function ToTensor() then follow with some Normalization ops in PyTorch, and ToTensor() function would change the data to [0,1]. But in this repo, it defines its own totensor() and normalize() functions, it haven't scale data to [0,1], but PyTorch example does.

I have tested with ucf101 split1, then the results showed if we don't normalize the data to [0,1] then the test accuracy is around 5% at epoch 15, but if we normalize the data then the accuracy is around 25% at epoch 15. If you don't believe me, you could try it with ucf101 split1 (not with sklearn random split provided by this repo) by yourself, you will see the same result.

wave-transmitter commented 5 years ago

It's not that I don't believe you, I am just trying to understand if you are making a fair comparison between the two normalization methods. You should give more details about your set-up, you haven't even mentioned which model you are trying to train...

In my opinion, if you want to evaluate both methods, you should compare the results after a number of epochs where both models have converged. E.g. you can apply an early-stopping after 99.9% accuracy reached in training set, or just train for a higher number of epochs. I have also trained the C3D model(without any changes) in official split1 of UCF101 and posted the results in #14. The 5% accuracy at 15 epochs that you reported does not comply with those results in #14.

leftthomas commented 5 years ago

@wave-transmitter I trained C3D with official split1 from scratch, not used the pre-trained model, and you could test the C3D model from scratch just change one line code in normalize function to frame = frame / 255.0, you will see the result. In this repo, the input tensor values are lage value such as 233.7, -45.2, etc. it's not common in deep learning training period, it easily causes the value overflow problem, because the conventional ops are matrix multiplication in essential. This is why someone had proposed issues like NAN loss value. mentioned in #17 . If you normalize the data to [0,1], you will see the NAN problem gone.

jamshaidwarraich commented 5 years ago

could you share the paper link.

shanchao0906 commented 4 years ago

How should the code be modified?Training loss is always NAN