Open lyyang01 opened 2 years ago
Hi! Thank you for your interest.
Yse, we made 40 frames for every video using SF and TSN.
FYI, we had each feature frame represent 0.25 seconds so that the whole 40 features represent 10seconds. For those videos of length less than 10 seconds were also processed into 40 frames feature video with padded frames. For instance, we treat the 5-second-long video with 20 frames of video features and 20 frames of paddings.
I hope it will help you understand.
Best, Jinwoo Kim
2021년 7월 20일 (화) 오후 5:43, lyyang @.***>님이 작성:
Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB3JF5RIXIOTSR3YBEDTYUZLXANCNFSM5AVKW6YQ .
Hi @hello-jinwoo, Thanks for your reply. But I have a question now. How did you pretrain TSP features on ActivityNet? Could you share the details with us?
Hi, thank you for your interest on our work.
We used the TSP network of R(2+1)34 pre-trained on ANet by the original author. You can find the weight here.
Thanks for your reply. I will check this link soon.
Hi @pplntech and @hello-jinwoo, re your comment about each feature frame representing 0.25 seconds. How is this possible considering the original pre-trained slowfast R50 model is trained with 2 second clips? I'm assuming you used 2 second input clips with a 0.25 second sliding window across the 10 second video, can you confirm that's correct?
Hi @hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.
Hi.
Thanks for your attention.
We interpolate the shorter-than-10-second video features or pad with zeros for making them 10 secs (in our case, 40frames).
Best regard, Jinwoo Kim
Jinwoo Kim M.S. Student, Dept. of Computer Science, Yonsei University
On Apr 4, 2022, at 11:32 AM, sqiangcao99 @.***> wrote:
Hi @hello-jinwoo https://github.com/hello-jinwoo, Thanks for sharing the code. I have some questions about the difference between the SF_TSN_interpolated feature and SF_TSN_padded feature. Looking forward for your reply.
— Reply to this email directly, view it on GitHub https://github.com/hello-jinwoo/LOVEU-CVPR2021/issues/1#issuecomment-1087048444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBIIB6FZTKGDL2ZLCRDRM3VDJICBANCNFSM5AVKW6YQ. You are receiving this because you were mentioned.
Hello! thanks for your great job and the code. I am learning a lot. However, I still have some confusions about the generation of video frames. The video features are provided in your work. I noted that the feature length is 40 for each video, does this means that you generate 40 frames for every video and extract their features with SF or TSN?