moabitcoin / ig65m-pytorch

PyTorch 3D video classification models pre-trained on 65 million Instagram videos
MIT License
265 stars 30 forks source link

Run on full Kinetics-400 dataset to verify accuracy claims #2

Open daniel-j-h opened 4 years ago

daniel-j-h commented 4 years ago

We validated the ported weights and model only on subset of Kinetics-400 we had at hand.

We should run over the full Kinetics-400 dataset and verify what the folks claim in:

https://github.com/facebookresearch/vmz

daniel-j-h commented 4 years ago

This is blocked by the Kinetics dataset being provided as YouTube video ids only and you have to scrape full videos to extract labeled frames. Which is a bit of a pain for 600k videos.

Tracking: https://github.com/activitynet/ActivityNet/issues/28#issuecomment-535287732

bjuncek commented 4 years ago

Also note that a LOT of the original kinetics-400 videos are actually not existing anymore. I can try to run them for you on their snapshot in a few days (maybe weeks, depends on my workload) :)

bjuncek commented 4 years ago
daniel-j-h commented 4 years ago

I can't believe how hard it is to work with the Kinetics dataset :man_facepalming:

If you have a snapshot with the extracted labeled clips, could you just shoot me a mail (check my github profile) please. It would be great to get it e.g. on a requester pays AWS S3 bucket :hugs:

I don't think asking you to run our models here every now and then is a good long term solution for us. At the same time the Kinetics situation is not a good place to be for video research in the first place.

daniel-j-h commented 4 years ago

Regarding evaluation strategy. Reading

https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf

  1. Experiments

The fc-only experiments should be good enough for a first step here: extract features for a fixed model (from our PyTorch port; see the extract tool), then train a logistic regressor on top.

bjuncek commented 4 years ago

Regarding evaluation strategy. Reading

https://research.fb.com/wp-content/uploads/2019/05/Large-scale-weakly-supervised-pre-training-for-video-action-recognition.pdf

  1. Experiments

The fc-only experiments should be good enough for a first step here: extract features for a fixed model (from our PyTorch port; see the extract tool), then train a logistic regressor on top.

Ah - my bad - I was looking at Du's CSN paper :)

FesianXu commented 4 years ago

I wanna report my own evaluation on Kineticcs 400. I used your transferred R(2+1)D model pretrained on IG65G and finetuned on kinetics, with clips length 8 and 32 repectively. My Kinetics 400 database is not complete yet, with about 10k training samples and 240 val samples lost.

I wrote the evaluation framework by myself and thus might make some difference. but the result seem normal compared to what the paper claim. It's a pity that the code might not be released since I've done it in my internship.