Wonder about get_random_clip_from_video (from data_utils.py)

suhyeok24 commented 2 weeks ago

Hi,

I am currently extending your research to examine the content-bias in FVD, and I’m using the code snippet you provided to calculate FVD:

from cdfvd import fvd
evaluator = fvd.cdfvd('videomae', ckpt_path=None)
evaluator.load_videos('ucf101', data_type='stats_pkl', resolution=128, sequence_length=16)
evaluator.compute_fake_stats(evaluator.load_videos('./example_videos/', data_type='video_folder'))
score = evaluator.compute_fvd_from_stats()

Instead of loading videos as stats_pkl, I’m downloading the actual videos and using video_folder as the data_type. However, I noticed something when looking at the VideoDataset code:

cache_file = osp.join(self.folder, f"metadata_{sequence_length}.pkl")
if not osp.exists(cache_file):
    # sequence_length = clip_length in frames
    clips = VideoClips(self.files, sequence_length, num_workers=4)
    try:
        pickle.dump(clips.metadata, open(cache_file, 'wb'))
    except:
        print(f"Failed to save metadata to {cache_file}")
else:
    metadata = pickle.load(open(cache_file, 'rb'))
    clips = VideoClips(self.files, sequence_length, _precomputed_metadata=metadata)

When preprocessing is done, a metadata file is generated and subsequently loaded. I noticed that while the video paths, frame structure (video_pts), and FPS remain consistent, the location of each clip within the video changes every time it is loaded.

def get_random_clip_from_video(self, idx: int) -> tuple:
    """
    Sample a random clip starting index from the video.
    idx -> video idx, clip_id : random

    Args:
        idx: Index of the video.
    """
    # Note that some videos may not contain enough frames, we skip those videos here.
    while self._clips.clips[idx].shape[0] <= 0:
        idx += 1
    n_clip = self._clips.clips[idx].shape[0]
    clip_id = random.randint(0, n_clip - 1)
    return idx, clip_id

In the line clip_id = random.randint(0, n_clip - 1), clips are selected randomly each time. However, this randomness is not influenced by the seed set during the initialization of evaluator = fvd.cdfvd('videomae', ckpt_path=None). Since the seed isn’t fixed, the clip selection varies with each run. I believe it might be necessary to add a function to fix the seed within this function to ensure consistency. Could you please take a look and let me know your thoughts?

Thank you very much!

songweige commented 2 weeks ago

Hi Su Hyeok!!

Thank you for raising the issue and super detailed instructions on your findings!!

For the random clip sampling in the data loading process, the random seed should have been determined here. That is, every time the clips are loaded from the videos, the same positions should be used to extract clips.

However, if you print clip_id inside the function get_random_clip_from_video, you may still see roughly the same clip ids but in a slightly different order. This is because the data loading is done in a multi-processing way. Different data-loading workers may have slightly different speeds when loading the data. To verify this is the reason, you can set num_workers=1 when computing the fake stats:

evaluator.compute_fake_stats(evaluator.load_videos('video_path', data_type='video_folder', num_workers=1))

That being said, the clips are supposed to be the same even when using more than one worker. Have you verified that evaluator.fake_stats.raw_mean has different values when you call evaluator.compute_fake_stats(evaluator.load_videos()) across different runs? Can you try setting n_fake=10 to be a small number and see if it still happens?

suhyeok24 commented 2 weeks ago

I realized I made an error in my previous code setup. Initially, I was not following the correct approach. Instead of writing the code as:

from cdfvd import fvd
evaluator = fvd.cdfvd('videomae', ckpt_path=None)
evaluator.load_videos('ucf101', data_type='stats_pkl', resolution=128, sequence_length=16)
evaluator.compute_fake_stats(evaluator.load_videos('./example_videos/', data_type='video_folder'))
score = evaluator.compute_fvd_from_stats()

I wrote the code as follows:

evaluator = fvd.cdfvd('videomae', n_real=args.num_clip_samples, n_fake=args.num_clip_samples, seed=args.seed, compute_feats=False, device="cuda", half_precision=True)

fake_video, fake_video_loader = evaluator.load_videos(
    video_info=f'/data/{args.dataset}/valid_subset2', 
    data_type='video_folder', 
    resolution=args.video_resolution, 
    sequence_length=args.num_frames, 
    sample_every_n_frames=1,
    corrupt=args.distortion_type, 
    corrupt_severity=args.severity, 
    num_workers=4, 
    batch_size=args.batch
)

evaluator.compute_fake_stats(fake_video_loader, concat=args.fvd_16)
save_path = os.path.join(save_dir, 'videomae_feature.pkl')
evaluator.save_fake_stats(save_path)

The issue stemmed from the fact that I did not pass the loader as an argument to the evaluator.compute_fake_stats. Instead, I declared the loader separately and then proceeded, which caused the seed not to be properly applied, leading to some confusion. The problem was not related to num_workers. In my specific case, I fixed the seed once more within the get_random_clip_from_video function inside the dataset.

Given my case, would applying the seed separately within the dataset (specifically within the get_random_clip_from_video function) be the correct approach to ensure consistency?

songweige commented 2 weeks ago

Thanks for the explanation and this makes a lot of sense to me now. I think, in general, as long as you fix the random seed before looping through the dataloader like here, the results should be reproducible. I'm not fully convinced that setting seeds within the get_random_clip_from_video function is optimal since it will give the same clip_id for the videos with the same length, but I don't think it hurts a lot! But glad you figured out the solution!!

suhyeok24 commented 2 weeks ago

Thanks for your kindful answer!

songweige / content-debiased-fvd

Wonder about get_random_clip_from_video (from data_utils.py) #7