happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
430 stars 77 forks source link

How to modify regression range if I change the pyramid levels? #54

Closed ttgeng233 closed 2 years ago

ttgeng233 commented 2 years ago

Thank you for your great work! I want to know how to set the regression range if I change the level. For example, if I set architecture as [2,2,1] with one downsampling, which range is correct: [(0,4), (4,8)] or [(32,64), (64,10000)], and how to understand it?

tzzcl commented 2 years ago

For the regression range, it is highly related to the dataset attributes/architectures, i.e., the longest action will last around 64 seconds, then the maximum regression range should be 64 (though we write 10000 in our code, that means infinity).

Then, we need to design the real regression range for each layer (since the regression range will be divided by the stride). Usually numbers in [4,8] are good choice for the real regression range. Thus you can decide the pyramid levels roughly equals to log2(the longest action length/real regression range). If you really want to modify the regression range, it will slightly decrease the performance. For your problem, I think, [(0,4), (4, 10000)] will be the case, though it will decrease the performance.

ttgeng233 commented 2 years ago

Thank you very much! I have another question that is there any specific strategies to set max_seq_len, for example, why max_seq_len=2304 for THUMOS14?

tzzcl commented 2 years ago

For the max_seq_len, we just want to cover long video clip inputs. For max_seq_len=2304, it will feed about 20 mins of a video into the model, which roughly reaches the maximum length of videos in THUMOS14.

happyharrycn commented 2 years ago

Closed due to inactivity.