huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.38k stars 5.26k forks source link

Add M1/ M2 support to text-to-video pipeline #2785

Open yearofthewhopper opened 1 year ago

yearofthewhopper commented 1 year ago

Is your feature request related to a problem? Please describe.

Currently after installing Mac OS 13.3 to get pipeline_text_to_video_synth.py to execute I was disappointed to find out the script for Text-to-Video wasn't working on my Mac M2. It seems there's no fallback in the pipeline for non-cuda enabled devices.

Describe the solution you'd like

Please add M1 / M2 support to the default Text-to-Video pipeline

Describe alternatives you've considered

enable a separate starter script that uses another pipeline or scheduler that avoids the cuda conflict inherent in TextToVideoSDPipeline.py

patrickvonplaten commented 1 year ago

Seems like MPS doesn't support the Conv3D kernel yet so this one will be difficult for now I guess cc @pcuenca

pcuenca commented 1 year ago

Yes, this is going to be very hard until https://github.com/pytorch/pytorch/issues/77818 is addressed.

yearofthewhopper commented 1 year ago

Seems like MPS doesn't support the Conv3D kernel yet so this one will be difficult for now I guess cc @pcuenca

this seems to have been addressed in 13.3 beta 2 as it was a error I no longer got. The cuda without a fallback was the "error" though I am addressing it as a feature request because it seems just overlooked (no fallback path other than cuda) rather than broken.

pcuenca commented 1 year ago

Hi @yearofthewhopper, I'm not sure what you mean. I'm on the latest 13.3 (22E252) and I still get the message Conv3D is not supported on MPS. Are you seeing a different problem?

gskishan004 commented 1 year ago

+1 @pcuenca, updated to 13.3 (Public Release - 22E252) and I'm still getting the RuntimeError: Conv3D is not supported on MPS

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

lucferbux commented 1 year ago

This is still not working, tried today (May 2 2023) and still getting the same error.

pcuenca commented 1 year ago

wip to avoid setting to stale until https://github.com/pytorch/pytorch/issues/77818 is resolved.

Aviral-A commented 1 year ago

Hello, I have a potentially dumb question here. I'm trying to get text to video working on Mac through the stable diffusion web ui, and I came across this pull request (https://github.com/pytorch/pytorch/pull/99246) that implements conv3D for MPS. However, these changes are outside of the torch folder, and only the torch folder is used for installing the PyTorch package in the web ui venv. I've tried putting the entire PyTorch directory as well as installing PyTorch and merging the pull request in the web ui venv, but to no avail. Anyone else have any luck?

jrittvo commented 1 year ago

It looks to me like perhaps the PR is not yet ready for merging because of one or more of the build errors?

ispulkit commented 10 months ago

Is someone actively working on this one?

pcuenca commented 10 months ago

We are waiting until https://github.com/pytorch/pytorch/pull/114183 is merged and makes it to a nightly build.

anribras commented 9 months ago

Wait for it . What is the plan ?

czkoko commented 9 months ago

torch-2.3.0.dev20231216 has add Conv3D support for MPS. But it seems need to use float32 to run the SVD model, otherwise the following error will occur: RuntimeError: expected scalar type float but found c10::Half

pcuenca commented 9 months ago

Update: Conv3D support has been added to MPS, but as @czkoko said there are still some problems with the mps pipeline. I fixed that issue, but I haven't been able to generate a video successfully (I'm getting black generations). Reproducing my steps here so others in the community can try and maybe can up with workarounds.

  1. Install PyTorch from a nightly build:
pip install --upgrade --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
  1. Install diffusers from the mps-video branch in this PR: https://github.com/huggingface/diffusers/pull/6220

  2. Test script (generates black frames and consumes ~45 GB of RAM in my system):

import torch

from diffusers import StableVideoDiffusionPipeline
from diffusers.utils import load_image, export_to_video

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe = pipe.to("mps")
# pipe.unet.set_default_attn_processor()
# pipe.unet.enable_forward_chunking()
# pipe.enable_attention_slicing()

# Load the conditioning image
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true")
image = image.resize((1024, 576))

generator = torch.manual_seed(42)
frames = pipe(
    image,
    num_frames=7,
    decode_chunk_size=1,
    generator=generator,
).frames[0]

export_to_video(frames, "generated.mp4", fps=7)
print("Video saved to generated.mp4")

Please, let us know if anybody can find or think of any potential fixes.

pcuenca commented 9 months ago

Update: running the pipeline in 32-bit works. I used a size of (768, 512) and memory consumption was ~50 GB.

mauriciotoro commented 6 months ago

Hi, given that Conv3 is now working for MPS, Is there a work around to run Text-to-Video? I am still getting the same error "CUDA is not available..."

mauriciotoro commented 6 months ago

Torch not compiled with CUDA enabled

pcuenca commented 6 months ago

@mauriciotoro on Apple Silicon, you have to use the mps device, not cuda, see code snippet above. You also have to install diffusers from the mps-video branch. As noted in my previous comment, the pipeline works last time I checked, but memory consumption was huge. This makes text-to-video impractical on mps for the moment.

bghira commented 6 months ago

https://github.com/pytorch/pytorch/pull/116580 is the new replacement PR for Conv3D.