v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
https://v-iashin.github.io/video_features
MIT License
499 stars 94 forks source link

Unknown Error when Extract Video Features with S3D and R(2+1)D #134

Open santiagosilas opened 4 months ago

santiagosilas commented 4 months ago

Hi. Firstly, thanks for implementing and sharing this library. It is very functional and helpful. Also, the code is very clear to understand :)

I observed an unexpected/unknown error when trying to extract the features S3D and R(2+1)D from 16-frame sequences of large videos. No error feedback is printed in this case.

Is there any reason for this behavior? A temporary workaround I found was to break the video into small subvideos and then submit it to the extractor.

This behavior only happens for S3D and  R(2+1)D. But the same large videos work for I3D and CLIP feature extractors.

Thanks in advance.

v-iashin commented 4 months ago

Hi.

i think the machine just goes out of ram. Check your process monitor to confirm that that’s the case.

I think it’s expected if you are using feature extractors that are backed by torchvision.io.read_video which reads the whole video in memory at once.

At the same time, if it is based on the cv2 reader, then it should be ok because it reads it frame by frame.

It is a long standing issue. To fix it, i need to check if the quality of these features regress when the model was trained with the torchvision reader but during inference we use another feature extractor.

It is also a bit disappointing that we don’t have a good video reader as a community but that’s another story :/

santiagosilas commented 3 months ago

Ok, thanks for the reply.