Failed to run Text-To-Video demo with video_length=8 on machine with GPU

yuanxion commented 1 year ago

CPU: 12th Gen Intel(R) Core(TM) i7-12700, 20cores, 31GB GPU: NVIDIA GeForce RTX 3080, 10GB

When trying the Text-To-Video demo according to the README.md:

import torch
import os
os.environ['CURL_CA_BUNDLE'] = ''

from model import Model
model = Model(device = "cuda", dtype = torch.float16)
print(f'--> model {model}')

prompt = "A horse galloping on a street"
params = {"t0": 44, "t1": 47 , "motion_field_strength_x" : 12, "motion_field_strength_y" : 12, "video_length": 8}

out_path, fps = f"./text2video_{prompt.replace(' ','_')}.mp4", 4
model.process_text2video(prompt, fps = fps, path = out_path, **params)

failed with video_length=8

params = {"t0": 44, "t1": 47 , "motion_field_strength_x" : 12, "motion_field_strength_y" : 12, "video_length": 8}

Log: t2v-video-config-failed-20230512.txt

ok with video_length=1
```
params = {"t0": 44, "t1": 47 , "motion_field_strength_x" : 12, "motion_field_strength_y" : 12, "video_length": 1}
```
https://github.com/yuanxion/Text2Video-Zero/assets/96522341/020a78e3-cda5-4bf4-a6be-057a2404f589

Log: t2v-video-config-ok-20230512.txt

XianFuWongIntel commented 1 year ago

I'm able to generate videos with > 1 frame on NVIDIA GeForce RTX 2080 Ti (11 GB), there are at least 2 ways to do it:

Lossless video quality Add param chunk_size=k, set k to a low value, where k should be in range of [2,video_length].

E.g., k=2, Max. GPU memory usage almost 11GB params = {"t0": 44, "t1": 47 , "motion_field_strength_x" : 12, "motion_field_strength_y" : 12, "video_length": 8, "chunk_size" : 2}

https://github.com/yuanxion/Text2Video-Zero/assets/8991906/3f6e7182-7afd-435e-be39-8b823401745d

Lossy video quality Add param merging_ratio =l, the higher the l the more compression is applied, where i should be in range of [0,1].

E.g., i=1, Max. GPU memory usage ~10GB params = {"t0": 44, "t1": 47 , "motion_field_strength_x" : 12, "motion_field_strength_y" : 12, "video_length": 8, "merging_ratio " : 1}

https://github.com/yuanxion/Text2Video-Zero/assets/8991906/c219a3c4-d72d-42dc-9192-35339226501a

Lowest GPU memory usage, ~7GB (chunk_size=2 and merging_ratio=1) params = {"t0": 44, "t1": 47 , "motion_field_strength_x" : 12, "motion_field_strength_y" : 12, "video_length": 8, "chunk_size": 2, "merging_ratio" : 1}

https://github.com/yuanxion/Text2Video-Zero/assets/8991906/725ea82c-4899-4708-979e-fe909dff0105

XianFuWongIntel commented 1 year ago

Examples with higher FPS and video length:

https://github.com/yuanxion/Text2Video-Zero/assets/8991906/bfd94dde-3197-45be-a602-ab1559395ac2

https://github.com/yuanxion/Text2Video-Zero/assets/8991906/3e9e3210-954c-4e08-bf97-14a8f0c6ef66

XianFuWongIntel commented 1 year ago

In short, we should be able to generate videos on GPU with memory size >7G.

yuanxion / Text2Video-Zero

Failed to run Text-To-Video demo with video_length=8 on machine with GPU #1