Open leftthomas opened 5 years ago
Why do you think this is a bug? Normalizing data to [0, 1] is not always the case. Subtracting the mean RGB values of the used dataset(usually ImageNet) for backbone's pre-training is also common. Function normalize() follows this approach.
If you want to prove that normalizing data to [0, 1] leads to higher performance, you have to elaborate more on this. The results that you provided are not comparable to each other. You could validate this by training you model while applying each time one of the two normalization approaches and report the results for the same number of epochs.
@wave-transmitter The common solution is that Normalization should be done after the data have been scaled to [0,1], we usually call the function ToTensor()
then follow with some Normalization ops in PyTorch, and ToTensor()
function would change the data to [0,1]. But in this repo, it defines its own totensor()
and normalize()
functions, it haven't scale data to [0,1], but PyTorch example does.
I have tested with ucf101
split1, then the results showed if we don't normalize the data to [0,1] then the test accuracy is around 5% at epoch 15, but if we normalize the data then the accuracy is around 25% at epoch 15. If you don't believe me, you could try it with ucf101
split1 (not with sklearn random split provided by this repo) by yourself, you will see the same result.
It's not that I don't believe you, I am just trying to understand if you are making a fair comparison between the two normalization methods. You should give more details about your set-up, you haven't even mentioned which model you are trying to train...
In my opinion, if you want to evaluate both methods, you should compare the results after a number of epochs where both models have converged. E.g. you can apply an early-stopping after 99.9% accuracy reached in training set, or just train for a higher number of epochs. I have also trained the C3D model(without any changes) in official split1 of UCF101 and posted the results in #14. The 5% accuracy at 15 epochs that you reported does not comply with those results in #14.
@wave-transmitter I trained C3D
with official split1 from scratch, not used the pre-trained model, and you could test the C3D
model from scratch just change one line code in normalize
function to frame = frame / 255.0
, you will see the result.
In this repo, the input tensor values are lage value such as 233.7, -45.2, etc. it's not common in deep learning training period, it easily causes the value overflow problem, because the conventional ops are matrix multiplication in essential. This is why someone had proposed issues like NAN loss value
.
mentioned in #17 . If you normalize the data to [0,1], you will see the NAN
problem gone.
could you share the paper link.
How should the code be modified?Training loss is always NAN
There is a bug about
normalize(self, buffer)
function indataset.py
, it has not normalize data to [0, 1], which we usually do this in Deep Learning training process with PyTorch. And I also tested it, if we don't normalize it, the training process was totally failed when I used the officialtrain/test
split ofUCF101
, after 54 epochs, the testing accuracy was only around 5%. And if we normalize it, the training process was fine, after 5 epochs, it obtained 8.2% testing accuracy. https://github.com/jfzhang95/pytorch-video-recognition/blob/ca37de9f69a961f22a821c157e9ccf47a601904d/dataloaders/dataset.py#L204