huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
24.3k stars 5.02k forks source link

TextToVideoSDPipeline outputs blank video #7559

Open rafationgson opened 3 months ago

rafationgson commented 3 months ago

Describe the bug

I am encountering an issue when using the TextToVideoSDPipeline. It is generating blank videos when I run the model in Replicate. The model is running on A100 (80GB) hardware.

Reproduction

pipe = TextToVideoSDPipeline.from_pretrained(
            "cerspense/zeroscope_v2_576w",
            torch_dtype=torch.float32,
        )
pipe.enable_sequential_cpu_offload()

# memory optimization
pipe.unet.enable_forward_chunking(chunk_size=1, dim=1)
pipe.enable_vae_slicing()

pipe.scheduler = DPMSolverMultistepScheduler.from_config(self.pipe.scheduler.config)

video_frames = pipe(
            prompt="astronaut riding a horse on mars, beautiful, 8k, perfect, award winning, national geographic",
            negative_prompt="very blue, dust, noisy, washed out, ugly, distorted, broken", 
            num_frames=24, 
            num_inference_steps=25,
            guidance_scale=12.5,
            width=576,
            height=320,
        ).frames[0]

video_path = export_to_video(video_frames, fps=24)
return Path(video_path)

Logs

No response

System Info

Cog yaml file build: gpu: true cuda: "12.1" python_version: "3.11.1" system_packages:

Who can help?

No response

tolgacangoz commented 3 months ago

Hi @rafationgson, What do you mean exactly as blank videos, only black frames? Why are you using .enable_sequential_cpu_offload()? Your GPU seems relatively decent. Do you have other processes that need to consume GPU VRAM? Could you replace it with .enable_model_cpu_offload() or .to('cuda')?

rafationgson commented 3 months ago

Hi @standardAI, yes black frames. Actually there are no other processes that need to consume GPU VRAM. I thought I needed to enable the offload.

I replaced it with to .to('cuda') and used torch.float16, but I am still encountering black frames.

tolgacangoz commented 3 months ago

What about .to('cuda') and torch.float32 without offloading?

rafationgson commented 3 months ago

What about .to('cuda') and torch.float32 without offloading?

Tried this too and issue still persists.

tolgacangoz commented 3 months ago

In what way are you exactly displaying video, via which function? What is your OpenCV version? Also, self.pipe is not another unrelated pipeline, right?

rafationgson commented 3 months ago

I am trying to display the video via the predict function of the Predictor class of my custom Cog model in Replicate. Yes self.pipe is the same pipeline, I forgot to remove it when I put the reproduction code above. As for my OpenCV version I believe it is the latest version: 4.9.0.80 since I didn't specify a version.

yiyixuxu commented 3 months ago

I can't reproduce this issue https://colab.research.google.com/drive/1Ul37s8OefIJ-RkpyNkPAqOc7P0gzRL78?usp=sharing

DN6 commented 3 months ago

@rafationgson can you check to see if the individual frames of the video are blank as well?

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu commented 2 months ago

@rafationgson is this still an issue?