mpc001 / auto_avsr

Auto-AVSR: Lip-Reading Sentences Project
Apache License 2.0
158 stars 40 forks source link

can i know input time? #2

Closed saeu5407 closed 1 year ago

saeu5407 commented 1 year ago

Hello, thank you for sharing a great model.

I leave an issue with a question.

  1. What is the time dimension that goes into the input of the model? (In the paper, we found that the fps is 25 but not the time dimension.)

  2. What happens if I put a video longer than the input time dimension?

Thank you.

mpc001 commented 1 year ago

Hello @saeu5407, we do not change the temporal resolution. Specifically, we use ResNet-18 as our visual front-end, and you can find more details about the input and output size for each layer in Table S2 on page 15 at here. Also, the temporal resolution is still the same in the conformer encoder. To summarise, whether you're referring to the front-end (ResNet-18) or back-end (conformer), the output time dimension will be the same size as the input time dimension for VSR.

saeu5407 commented 1 year ago

Thank you for your answer!

Then, is there no problem with putting any length of video?

mpc001 commented 1 year ago

Hi, there is no problem with putting any length of video.

saeu5407 commented 1 year ago

thank you for your paper and answer