happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
415 stars 77 forks source link

max_seq_len during inference #109

Closed Jaswar closed 1 year ago

Jaswar commented 1 year ago

Hi, I noticed that during inference the videos are padded to the length indicated by max_seq_len here. I wanted to ask, why is this padding happening? Would it be sufficient to pad like it is done in the else case?

To give some context, I am attempting to measure ActionFormer's inference performance with videos of different lengths (similar to what you measured in table 3b but for different feature lengths). As videos smaller than max_seq_len are padded to be of size max_seq_len, all of them take the exact same amount of time. I would hence like to simply lower max_seq_len in thumos_i3d.yaml to lowest allowable value (576) and then pass videos of sizes from 576 to 2304.

I have passed an example video of length 915 (video_validation_0000990.npy) with both configurations (max_seq_len set to 576 and 2304), which results in padded shapes of 1152 and 2304 respectively. The resulting output of the network is the same for both cases. Hence my question, is padding to max_seq_len necessary (at least in the case of THUMOS dataset)?

happyharrycn commented 1 year ago

Good point. I don't think padding to max_seq_len is necessary at inference time. You will still need to pad to a divisible size (see here). The results on the test set should be similar if not exactly the same.

Jaswar commented 1 year ago

Okay, that makes sense. Thank you for the quick reply.