TSN setting for Diving48

Hi, thanks for sharing your interesting work.

I have some questions about the TSN result in the paper, because I'm running TSN/TRN with Diving48 but I'm getting a way higher number.

Where did you get the number from? It looks like this repository doesn't have TSN model, so did you just use the original TSN code?
How many frames did you input? I know that it's not 16 but was it 8 or 32?
How did you sample the video? Sparsely sampled throughout the video (TSN strategy), or densely sampled (3D CNN strategy)?

I used 8-frame input and trained/tested with 25% of the Diving48 data (official split V2). I used sparsely sampled video, train scale jittering in [224,336] range and used 224x224 input resolution, and I got way over 50% on TSN which doesn't make sense. My TSN/TRN code would show matching baseline results for other datasets like Something-Something, EPIC-Kitchens etc., so I'm wondering what the difference in the settings would be.

Thank you!

vt-vl-lab / SDN

TSN setting for Diving48 #11