Open AA-Developer opened 2 days ago
We have not extensively tested the use of the new model on Gradio, could you try the Jupyter notebook instead?
yes its Jupyter notebook i use it with Gradio
I have a 4090 and it's eating up the entire 24G VRAM.
In my case it doesn't crash, but obviously since it exceeds the upper bound it takes very long to generate with the 768p model (around 18 minutes). I get 116.60s/it.
For the record, the 384p model works just fine, using just 6~8G VRAM.
我有一台 4090,它占用了整个 24G VRAM。
在我的情况下它不会崩溃,但显然由于它超出了上限,使用 768p 模型生成需要很长时间(大约 18 分钟)。我得到 116.60s/it。
记录显示,384p 型号运行良好,仅使用 6~8G VRAM。
Why does it only take me a few minutes to use the SD3 model 768P's? I have a 3090/24 video card and I haven't tried the 768P for the FX model yet, it's hard to imagine the difference being that big.
OutOfMemoryError Traceback (most recent call last)
Cell In[6], line 15
12 # Noting that, for the 384p version, only supports maximum 5s generation (temp = 16)
14 with torch.no_grad(), torch.cuda.amp.autocast(enabled=True if model_dtype != 'fp32' else False, dtype=torch_dtype):
---> 15 frames = model.generate(
16 prompt=prompt,
17 num_inference_steps=[20, 20, 20],
18 video_num_inference_steps=[10, 10, 10],
19 height=height,
20 width=width,
21 temp=temp,
22 guidance_scale=7.0, # The guidance for the first frame, set it to 7 for 384p variant
23 video_guidance_scale=5.0, # The guidance for the other video latent
24 output_type="pil",
25 save_memory=True, # If you have enough GPU memory, set it to False
to improve vae decoding speed
26 )
28 export_to_video(frames, "./text_to_video_sample.mp4", fps=24)
29 show_video(None, "./text_to_video_sample.mp4", "70%")
File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.
OutOfMemoryError: CUDA out of memory. Tried to allocate 6.81 GiB. GPU 0 has a total capacty of 22.17 GiB of which 6.54 GiB is free. Process 655789 has 15.62 GiB memory in use. Of the allocated memory 14.83 GiB is allocated by PyTorch, and 576.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.
在 L4 24gb 上的 gradio 应用程序中出现 OOM 错误,384p 可以运行,720p 会出现 OOM 错误。jupyter 脚本也出现错误:
6%|▋ | 1/16 [00:42<10:34, 42.32s/it] OutOfMemoryError Traceback(最近一次调用最后一次) Cell In[6],第 15 行 12 # 请注意,对于 384p 版本,仅支持最大 5s 生成(temp = 16) 14 with torch.no_grad(), torch.cuda.amp.autocast(enabled=True if model_dtype != 'fp32' else False, dtype=torch_dtype): ---> 15 frames = model.generate( 16 prompt=prompt, 17 num_inference_steps=[20, 20, 20], 18 video_num_inference_steps=[10, 10, 10], 19 height=height, 20 width=width, 21 temp=temp, 22 guide_scale=7.0, # 第一帧的指导,对于 384p 变体将其设置为 7 23 video_guidance_scale=5.0, # 其他视频潜在的指导 24 output_type="pil", 25 save_memory=True, # 如果您有足够的 GPU 内存,请将其设置为
False
以提高 vae 解码速度 26 ) 28 export_to_video(frames, "./text_to_video_sample.mp4", fps=24) 29 show_video(None, "./text_to_video_sample.mp4", "70%")文件 /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/_contextlib.py:115,在 context_decorator..decorate_context(*args, kwargs) 中 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 使用 ctx_factory(): --> 115 return func(args, kwargs) ... 365 ) 366 stagehiddenstates = stagehiddenstates.transpose(1, 2).flatten(2, 3) # [bs, tot_seq, dim] 368 output_encoderhiddenlist.append(stagehiddenstates[:, :encoder_length])
OutOfMemoryError:CUDA 内存不足。尝试分配 6.81 GiB。GPU 0 的总容量为 22.17 GiB,其中 6.54 GiB 是空闲的。进程 655789 使用了 15.62 GiB 内存。在分配的内存中,14.83 GiB 由 PyTorch 分配,576.29 MiB 由 PyTorch 保留但未分配。如果保留但未分配的内存很大,请尝试设置 max_split_size_mb 以避免碎片化。 My 3090/24GB can run what about your prompt of insufficient GPU memory, although my 24GB RAM can run but I also get a system reboot due to insufficient GPU power.
i have 24 GB vram (rtx 3090) also i use cpu_offloading = True