Understanding TSN setup

rohitgirdhar / CATER

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning

Apache License 2.0

103 stars 19 forks source link

In table 3 on paper, apparently you used 1 or 3 frames for TSN experiments. What does it mean? Why did you train using 3 frames and test using 250 frames? The task 3 is really challenging and it doesn't make sense to solve it using only 3 frames. I must have been mistaken about the setup. Does it mean you sampled 3 frames per segment? Then how many segments are used and how many total frames are seen on training time?

Also, what is the detailed setup for the TSN+LSTM? It appears that you used 10 clips for the LSTM on 3D models, but using TSN did you still use 10 "frames"? Or how did you set it up for the TSN?

Lastly, do you have any plan for releasing the TSN code?

Thanks a lot for your awesome research!!

rohitgirdhar / CATER

Understanding TSN setup #25