Closed ttgeng233 closed 2 years ago
For the regression range, it is highly related to the dataset attributes/architectures, i.e., the longest action will last around 64 seconds, then the maximum regression range should be 64 (though we write 10000 in our code, that means infinity).
Then, we need to design the real regression range for each layer (since the regression range will be divided by the stride). Usually numbers in [4,8] are good choice for the real regression range. Thus you can decide the pyramid levels roughly equals to log2(the longest action length/real regression range). If you really want to modify the regression range, it will slightly decrease the performance. For your problem, I think, [(0,4), (4, 10000)] will be the case, though it will decrease the performance.
Thank you very much! I have another question that is there any specific strategies to set max_seq_len, for example, why max_seq_len=2304 for THUMOS14?
For the max_seq_len, we just want to cover long video clip inputs. For max_seq_len=2304, it will feed about 20 mins of a video into the model, which roughly reaches the maximum length of videos in THUMOS14.
Closed due to inactivity.
Thank you for your great work! I want to know how to set the regression range if I change the level. For example, if I set architecture as [2,2,1] with one downsampling, which range is correct: [(0,4), (4,8)] or [(32,64), (64,10000)], and how to understand it?