Run on full Kinetics-400 dataset to verify accuracy claims

moabitcoin / ig65m-pytorch

PyTorch 3D video classification models pre-trained on 65 million Instagram videos

MIT License

265 stars 30 forks source link

Run on full Kinetics-400 dataset to verify accuracy claims #2

Open daniel-j-h opened 5 years ago

daniel-j-h commented 5 years ago

We validated the ported weights and model only on subset of Kinetics-400 we had at hand.

We should run over the full Kinetics-400 dataset and verify what the folks claim in:

https://github.com/facebookresearch/vmz

daniel-j-h commented 5 years ago

This is blocked by the Kinetics dataset being provided as YouTube video ids only and you have to scrape full videos to extract labeled frames. Which is a bit of a pain for 600k videos.

Tracking: https://github.com/activitynet/ActivityNet/issues/28#issuecomment-535287732

bjuncek commented 5 years ago

Also note that a LOT of the original kinetics-400 videos are actually not existing anymore. I can try to run them for you on their snapshot in a few days (maybe weeks, depends on my workload) :)

bjuncek commented 5 years ago

not sure if it's in the repo, but you'd have to implement the FCN testing scheme from the paper that I believe was used there.

daniel-j-h commented 5 years ago

I can't believe how hard it is to work with the Kinetics dataset :man_facepalming:

If you have a snapshot with the extracted labeled clips, could you just shoot me a mail (check my github profile) please. It would be great to get it e.g. on a requester pays AWS S3 bucket :hugs:

I don't think asking you to run our models here every now and then is a good long term solution for us. At the same time the Kinetics situation is not a good place to be for video research in the first place.

daniel-j-h commented 5 years ago

Regarding evaluation strategy. Reading

https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf

Experiments

The fc-only experiments should be good enough for a first step here: extract features for a fixed model (from our PyTorch port; see the extract tool), then train a logistic regressor on top.

bjuncek commented 5 years ago

Regarding evaluation strategy. Reading

https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf

Experiments

The fc-only experiments should be good enough for a first step here: extract features for a fixed model (from our PyTorch port; see the extract tool), then train a logistic regressor on top.

Ah - my bad - I was looking at Du's CSN paper :)

FesianXu commented 4 years ago

I wanna report my own evaluation on Kineticcs 400. I used your transferred R(2+1)D model pretrained on IG65G and finetuned on kinetics, with clips length 8 and 32 repectively. My Kinetics 400 database is not complete yet, with about 10k training samples and 240 val samples lost.

video clip =8: clip accuracy = 59.60%, video accuracy = 73.20%
video clip = 32: clip accuracy = 65.86%, video accuracy = 77.92%

I wrote the evaluation framework by myself and thus might make some difference. but the result seem normal compared to what the paper claim. It's a pity that the code might not be released since I've done it in my internship.