Closed overfiter closed 1 year ago
n_sample_frames
is the number of frames used for training the model
video_length
is the number of frames used for inference (i.e., generating new videos)
here, they should be the same.
tks for your reply! on my rtb3060(12g), with xformers, i just can use 3 frames to train and inference(each more 1 frame need about 1g vram). Is large vram necessary for generaging long video?
i think 12GB vram can do 8-frame video with xformers. this colab demo runs 8-frame video on a Tesla T4 (15GB). you may double check if your xformers is working. simply adding more frames to a video will not result in a proportional increase in varm, a V100 (24GB) can process 32-frame videos.
to save varm, i set n_sample_frames=1, and video_length=12. (i don't know what they mean) and i got an error:
File "E:\repo\Tune-A-Video\train_tuneavideo.py", line 374, in
main(*OmegaConf.load(args.config))
File "E:\repo\Tune-A-Video\train_tuneavideo.py", line 339, in main
sample = validation_pipeline(prompt, generator=generator, latents=ddim_inv_latent,
File "C:\Users\coreyzhong\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(args, **kwargs)
File "E:\repo\Tune-A-Video\tuneavideo\pipelines\pipeline_tuneavideo.py", line 356, in call
latents = self.prepare_latents(
File "E:\repo\Tune-A-Video\tuneavideo\pipelines\pipeline_tuneavideo.py", line 303, in prepare_latents
raise ValueError(f"Unexpected latents shape, got {latents.shape}, expected {shape}")
ValueError: Unexpected latents shape, got torch.Size([1, 4, 1, 64, 64]), expected (1, 4, 12, 64, 64)