happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
415 stars 77 forks source link

What's the unit of the maximum input sequence (max_seq_len) #92

Closed makecent closed 5 months ago

makecent commented 1 year ago

I am curious about the unit of the maximum input sequence (max_seq_len). As mentioned in paper, it seems the unit is single-frame:

... When using a input sequence length of 512 (typo here, should be 576), similar to what was considered in [77] (512), our method only has a minor drop in average mAP (-1.1%) and significantly outperforms [77] ...

But in the code, I found it seems the unit of max_seq_len is the 4-frame because the features are extracted with stride=4. Therefore, a input feature sequence with max_seq_len of 2304 should cover information of consecutive *2304 4** frames.

When compared with the [77], it seem it's not fair to directly compare the 576 with 512 because the 512 used in the [77] represent a consecutive 512 frames, while the 576 in this work represent a feature of 576 frames with interval 4, thus this work has much longer temporal input.

happyharrycn commented 1 year ago

The maximum input sequence length is defined on the feature grid, and thus equal to the maximum of clips (video features). I can't recall the details in [77] and will look into this later.

MrinalTyagi commented 1 year ago

@happyharrycn Can you further describe how you guys came across 2304 as the number to set for max_seq_len? Thanks for such amazing work.

tzzcl commented 1 year ago

Actually, we have an ablation study in Appendix Table B of ActionFormer. You can find that enlarging the max_seq_len will bring a slight performance boost.

MrinalTyagi commented 1 year ago

Actually, we have an ablation study in Appendix Table B of ActionFormer. You can find that enlarging the max_seq_len will bring a slight performance boost.

So is it just for training scenario right? Also incase of any features, the value for the same should be the value of the largest feature size available or it can be any value?

tzzcl commented 1 year ago

Yes it is just for training, and it can be any value (smaller max_seq_len may results in bad results).

MrinalTyagi commented 1 year ago

Yes it is just for training, and it can be any value (smaller max_seq_len may results in bad results).

I tried out changing the value. With very less max_seq_len, I was not able to train the model due to an assertion in LocPointGenerator(assert feat_len <= buffer_pts.shape[0], "Reached max buffer length for point generator"). Would really appreciate your feedback on same.

tzzcl commented 1 year ago

For very small max_seq_len, you should increase the max_buffer_len_factor in the config.

happyharrycn commented 5 months ago

Closed due to inactivity.