Closed datumbox closed 2 years ago
Thanks for spotting this @datumbox !
To answer your questions:
sec
introduces rounding errors and we end up in some cases with one less frame. I've flagged this in https://github.com/pytorch/vision/issues/4112#issuecomment-867747637 but it hasn't been addressed yet. cc @prabhat00155 Thanks for the reply Francisco. Given that our reference is basically broken, I bumped the priority.
An alternative that could buy us time is to submit a temporary patch to remove the assertion until this is properly investigated and fixed. I'm not particularly fond of this, but it might be worth considering if the actual fix is complex and requires time. I'll leave @prabhat00155 and @bjuncek to comment on that.
Concerning the accuracies, I feel that the frame issue shouldn't affect them too much. It's unclear how many records have issues but as you can see from the log, it took parsing 75% of the validation data until we find one record that has issues. While working on the multi-weights project, I've ran tests on multiple existing models and I've noticed quite some variation comparing to our documentation (though not as bad as this one). Definitely worth investigating more.
In terms of accuracy, this is weird and not expected for a single frame difference;
Having said that, many things have changed in the setup (potentially even the files used, iirc, we trained it on a resampled 480 version of kinetics from FAIR cluster which was also hosted on /datasets/kinetics/07062018
but not under 400, but there was another subfolder there).
@datumbox if possible one should a) apply the patch mentioned and b) run it on the Kinetics version that is publicly available (download from torchvision dataset should work fine, and re-run the ref scrips). For the long time, there were many different dataset versions, which were dependent on resampling, region from where the dataset was donwloaded and general dataset degradation due to youtube TOS. Since now for the first time we have a publicly available version of the videos, so let's use this opportunity to update our references.
@bjuncek Thanks for confirming that we were not using the right dataset. Indeed I can see a 480 version called val_avi-480p
. Unfortunately I don't have the bandwidth to run for you complex investigations but I will run the model on top of the 480 version and let you know if the accuracy matches.
I ran the following and get:
torchrun --nproc_per_node=8 train.py --data-path /datasets01/kinetics/070618/ --train-dir=val_avi-480p --val-dir=val_avi-480p --batch-size=64 --sync-bn --test-only --pretrained --cache-dataset
Acc@1 57.029 Clip Acc@5 78.352
As you can see the results are closer but not identical to the reported numbers. I think this requires additional investigation. Potentially trying all available datasets on DevFAIR to see which one is the right one.
@bjuncek Do you have other information you could share on how the models were trained? Logs? Training paths? Anything can help.
@prabhat00155 I see that you self-assigned the ticket so I assume you plan to investigate. Let me know if you need anything from me.
@datumbox I've been able to confirm that I don't run into an error anymore. Could you double check on your end (I just used kinetics400 val set like in the example above)?
@bjuncek I confirm that this is solved on the latest main. Thanks!
🐛 Describe the bug
Running on main:
throws the following error:
If we apply the following patch:
We get an accuracy which is far from the expected one:
Questions:
r2plus1d_18
model?cc @pmeier @fmassa @bjuncek
Versions
Latest main 0817f7f