happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
415 stars 77 forks source link

problem about truncate_feats() function in data_utils.py #117

Closed Charlestar closed 8 months ago

Charlestar commented 1 year ago

In data_utils.py line 78, there is

left = torch.maximum(window[:, 0] - offset, data_dict['segments'][:, 0])
right = torch.minimum(window[:, 1] + offset, data_dict['segments'][:, 1])

I can't understand why we need to plus and minus offset here. In my opinion, data_dict['segments'] and feats has already aligned before, and in this function, there will no problem about offset, the window is generated initially with offset. Maybe my understanding is wrong, please help me!

tzzcl commented 1 year ago

Please see the FAQ.md for questions about offsets.

Charlestar commented 1 year ago

Actually I understand what is offset and how it works, that's why I think the offset in above code may useless. The offset is introduced because features and annotations are not aligned, this may caused by feature extraction, but in thumos14.py line 155-169, this code has already done the alignment, so why we need to consider about offset in truncate_feats() used below?

tzzcl commented 1 year ago

Sorry for misunderstanding your questions, I think the problem is:

We want to truncate the feature sequence to perform data augmentation. Thus, we random choose the starting index with max_seq_len. Then, we need to complete the alignment again as you see in thumos14 dataloader.

Charlestar commented 1 year ago

I don't think this reason convinces me, max_seq_len is also considered after the alignment. Maybe I'll give an example to understand better, let us consider a borderline case, the action is so long that exceeded (max_seq_len + 2 offset), and the random st satisfied st - offset > action start and ed = st + max_seq_len < action end, if just specify has_action=True, then the function truncate_feats() will return a segment of length `max_seq_len + 2offset, I think this breaks themax_seq_len` setting and may harm to later program.

happyharrycn commented 10 months ago

It is important to notice that a spot in the feature grid lies in the center of a short time span (i.e., local window). When truncating the feature and resetting the time stamps, we will need to identify the left border of the starting feature spot, and the right border of the ending feature spot (in the actual time stamps). That is what is implemented in the code here and we have verified that this is the right way of doing so.