innat / VideoMAE

[NeurIPS'22] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Apache License 2.0
14 stars 3 forks source link

Live-input #3

Open hagonata opened 8 months ago

hagonata commented 8 months ago

Is it can handle live video input? With live predictions like And also can I use weights which I train with VideoMAE from Hugging-Face pipeline? They have this list of files: image

innat commented 8 months ago

Is it can handle live video input? With live predictions like

Yes, I think it is possible to apply video-mae on live-video. Have you tried it? If you face any issue with this codebases, let me know. This sounds interesting, I might add it later.

And also can I use weights which I train with VideoMAE from Hugging-Face pipeline? They have this list of files:

I'm not sure, as the underline framework is different.

hagonata commented 8 months ago

@innat For example what we got after training with your model? I think mine must be converted.

Is it can handle live video input? With live predictions like

Yes, I think it is possible to apply video-mae on live-video. Have you tried it? If you face any issue with this codebases, let me know. This sounds interesting, I might add it later.

I tried with a long video, but it just gave me one prediction for the whole video. But how I understand I just need to open my camera with OpenCV and pre-process video. So will be cool if you have some notebooks to test with if you have seen smth like this before.