questions about offset_frames

Jmh0527 commented 2 weeks ago

I notice that self.offset_frames are set to 8 in the "ThumosPaddingDataset'. Is this a generic setting with clip_len=16? I am using videomae2 to extract video features with configuration of "snippet_stride=4, clip_length=16, frame_interval=1", and I also need to set offset_frames to 8 in the "ThumosPaddingDataset' traning with videomamba ?

sming256 commented 2 weeks ago

It really depends on how you extract the feature. For example,

If the frame index (16frames) of your first feature is 0,1,...,8,...,14,15, then offset_frames should be 8. The I3D feature used in ActionFormer is based on such a setting.
If the frame index of the first feature is 0,...,2,...,4, then it should be 2. Basically, the offset_frames means the actual index of the center frame when extracting the first clip feature. In our codebase, we also extract the VideoMAEv2 feature for THUMOS (see here), and the offset_frames is set to stride//2, which is 4//2=2.

sming256 commented 2 weeks ago

We will release the feature extraction code soon. Once you see the code, you will understand it more clearly.

Jmh0527 commented 2 weeks ago

If I extract features like below, does it mean "snippet_stride=2, clip_length=16, frame_interval=1", and the offset_frames should be stride//2, which is 2//2=1 ?


num_videos = len(vid_list)
for idx, vid_name in enumerate(vid_list):
    url = os.path.join(args.save_path, vid_name.split('.')[0] + '.npy')
    if os.path.exists(url):
        continue
    video_path = os.path.join(args.data_path, vid_name)
    vr = video_loader(video_path)
    feature_list = []
    for start_idx in start_idx_range(len(vr)):
        # start_idx_range is range(0, num_frames - 15, 2)
        data = vr.get_batch(np.arange(start_idx, start_idx + 16)).asnumpy()
        frame = torch.from_numpy(data) 
        frame_q = transform(frame) 
        input_data = frame_q.unsqueeze(0).cuda()

        with torch.no_grad():
            feature = model.forward_features(input_data)
            feature_list.append(feature.cpu().numpy())

    # [N, C]
    np.save(url, np.vstack(feature_list))
    print(f'[{idx} / {num_videos}]: save feature on {url}')

sming256 commented 2 weeks ago

In the above code, the offset_frame should be 8. The frame index of the first clip is 0,...,15. After mean pooling, you will get one feature, and the corresponding timestamp of this feature should be the middle frame index, which is 8.

sming256 / OpenTAD

questions about offset_frames #22