Use bigger R(2+1)d pre-trained on IG65M

v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

https://v-iashin.github.io/video_features

MIT License

509 stars 96 forks source link

Use bigger R(2+1)d pre-trained on IG65M #7

Closed daniel-j-h closed 2 years ago

daniel-j-h commented 3 years ago

Hey there, awesome project you have here :tada:

I checked your code and seeing

https://github.com/v-iashin/video_features/blob/71a9a08d6a350a589fd275ff7f94803757573bca/models/r21d/extract_r21d.py#L9

I was wondering if you have thought about dropping this one in

https://github.com/moabitcoin/ig65m-pytorch

since the r(2+1)d 18 features are - based on experience - not the best to work with, in the r(2+1)d family.

:v:

v-iashin commented 3 years ago

Hi 👋 !

Thanks a lot for the pointer!

I am not sure if I want to drop R(2+1)D-18 RGB completely because, for now, I think it is better to have a bigger coverage rather than the SotA selection which is not sustainable in the long term.

However, I would be absolutely happy to have IG65M with this API.

Would you like to form a PR with the IG65M implementation?

daniel-j-h commented 3 years ago

Hey, realistically I don't have the time to work on this in the foreseeable future. I just wanted to point you to the IG65M pytorch model since it should be a drop in replacement wrt. its api and usage :hugs:

v-iashin commented 3 years ago

Hi, I understand. Thanks anyway

v-iashin commented 2 years ago

Thanks to @ohjho now our repo supports it