sming256 / OpenTAD

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
Apache License 2.0
106 stars 5 forks source link

questions about offset_frames #22

Closed Jmh0527 closed 1 week ago

Jmh0527 commented 2 weeks ago

I notice that self.offset_frames are set to 8 in the "ThumosPaddingDataset'. Is this a generic setting with clip_len=16? I am using videomae2 to extract video features with configuration of "snippet_stride=4, clip_length=16, frame_interval=1", and I also need to set offset_frames to 8 in the "ThumosPaddingDataset' traning with videomamba ?

sming256 commented 2 weeks ago

It really depends on how you extract the feature. For example,

sming256 commented 2 weeks ago

We will release the feature extraction code soon. Once you see the code, you will understand it more clearly.

Jmh0527 commented 2 weeks ago

If I extract features like below, does it mean "snippet_stride=2, clip_length=16, frame_interval=1", and the offset_frames should be stride//2, which is 2//2=1 ?


num_videos = len(vid_list)
for idx, vid_name in enumerate(vid_list):
    url = os.path.join(args.save_path, vid_name.split('.')[0] + '.npy')
    if os.path.exists(url):
        continue
    video_path = os.path.join(args.data_path, vid_name)
    vr = video_loader(video_path)
    feature_list = []
    for start_idx in start_idx_range(len(vr)):
        # start_idx_range is range(0, num_frames - 15, 2)
        data = vr.get_batch(np.arange(start_idx, start_idx + 16)).asnumpy()
        frame = torch.from_numpy(data) 
        frame_q = transform(frame) 
        input_data = frame_q.unsqueeze(0).cuda()

        with torch.no_grad():
            feature = model.forward_features(input_data)
            feature_list.append(feature.cpu().numpy())

    # [N, C]
    np.save(url, np.vstack(feature_list))
    print(f'[{idx} / {num_videos}]: save feature on {url}')
sming256 commented 2 weeks ago

In the above code, the offset_frame should be 8. The frame index of the first clip is 0,...,15. After mean pooling, you will get one feature, and the corresponding timestamp of this feature should be the middle frame index, which is 8.