hello-jinwoo / LOVEU-CVPR2021

27 stars 3 forks source link

Details about the frame generation #1

Open lyyang01 opened 2 years ago

lyyang01 commented 2 years ago

Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?

hello-jinwoo commented 2 years ago

Hi! Thank you for your interest.

Yse, we made 40 frames for every video using SF and TSN.

FYI, we had each feature frame represent 0.25 seconds so that the whole 40 features represent 10seconds. For those videos of length less than 10 seconds were also processed into 40 frames feature video with padded frames. For instance, we treat the 5-second-long video with 20 frames of video features and 20 frames of paddings.

I hope it will help you understand.

Best, Jinwoo Kim

2021년 7월 20일 (화) 오후 5:43, lyyang @.***>님이 작성:

Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB3JF5RIXIOTSR3YBEDTYUZLXANCNFSM5AVKW6YQ .

guuzaa commented 2 years ago

Hi @hello-jinwoo, Thanks for your reply. But I have a question now. How did you pretrain TSP features on ActivityNet? Could you share the details with us?

pplntech commented 2 years ago

Hi, thank you for your interest on our work.

We used the TSP network of R(2+1)34 pre-trained on ANet by the original author. You can find the weight here.

guuzaa commented 2 years ago

Thanks for your reply. I will check this link soon.

tullie commented 2 years ago

Hi @pplntech and @hello-jinwoo, re your comment about each feature frame representing 0.25 seconds. How is this possible considering the original pre-trained slowfast R50 model is trained with 2 second clips? I'm assuming you used 2 second input clips with a 0.25 second sliding window across the 10 second video, can you confirm that's correct?

sqiangcao99 commented 2 years ago

Hi @hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.

hello-jinwoo commented 2 years ago

Hi.

Thanks for your attention.

We interpolate the shorter-than-10-second video features or pad with zeros for making them 10 secs (in our case, 40frames).

Best regard, Jinwoo Kim


Jinwoo Kim M.S. Student, Dept. of Computer Science, Yonsei University

On Apr 4, 2022, at 11:32 AM, sqiangcao99 @.***> wrote:

Hi @hello-jinwoo https://github.com/hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.

— Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1#issuecomment-1087048444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB6FZTKGDL2ZLCRDRM3VDJICBANCNFSM5AVKW6YQ. You are receiving this because you were mentioned.