bghira commented 6 months ago

Describe the bug

Just to keep track of this issue, because I'm not sure if I've done something wrong or if it's due to the current issues in Pytorch's MPS support.

I'm having bad results running inference on Apple MPS with DeepFloyd stage I / II in Diffusers:

Stage I (400M)

CPU (float32)

MPS (float32)

Stage II (450M)

CPU (float32)

MPS (float32)

Reproduction

from diffusers import DiffusionPipeline
import torch

prompts = {
   'jester': 'a stunning portrait of a jester at the twisted carnival'
}

deepfloyd_lora_path = "ptx0/deepcinema"
deepfloyd_base_model_path = "DeepFloyd/IF-I-M-v1.0"
deepfloyd_stage_two_path = "DeepFloyd/IF-II-M-v1.0"

width = 96
height = 64
torch_device = "cuda" if torch.cuda.is_available() else "cpu" if torch.backends.mps.is_available() else "xpu" if torch.xpu.is_available() else "cpu"

pipe = DiffusionPipeline.from_pretrained(deepfloyd_base_model_path, watermarker=None, safety_checker=None, local_files_only=True).to(device=torch_device, dtype=torch.float32)
lora_pipe = DiffusionPipeline.from_pretrained(deepfloyd_base_model_path, **pipe.components, local_files_only=True).to(device=torch_device, dtype=torch.float32)
lora_pipe.load_lora_weights(deepfloyd_lora_path, weight_name="pytorch_lora_weights.safetensors")
lora_pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, variance_type="fixed_small")

from diffusers.pipelines import IFSuperResolutionPipeline
stage2_pipe = IFSuperResolutionPipeline.from_pretrained(deepfloyd_stage_two_path, watermarker=None, safety_checker=None, local_files_only=True).to(device=torch_device, dtype=torch.float32)

import os
for shortname, prompt in prompts.items():
    output_dir = f"outputs/{shortname}"
    if os.path.exists(output_dir):
        continue
    os.makedirs(output_dir, exist_ok=True)
    torch.manual_seed(42)
    image_base = pipe(prompt=prompt, width=width, height=height, guidance_scale=5.5, num_inference_steps=30).images[0]
    image_base.save(os.path.join(output_dir, "base.png"))

    torch.manual_seed(42)
    image_lora = lora_pipe(prompt=prompt, width=width, height=height, guidance_scale=5.5, num_inference_steps=30).images[0]
    image_lora.save(os.path.join(output_dir, "base_lora.png"))

    torch.manual_seed(84)
    image_base_2 = stage2_pipe(prompt=prompt, image=image_base, guidance_scale=5.5, num_inference_steps=30, width=width * 4, height = height * 4).images[0]
    image_base_2.save(os.path.join(output_dir, "base_stage2.png"))
    torch.manual_seed(84)
    image_lora_2 = stage2_pipe(prompt=prompt, image=image_lora, guidance_scale=5.5, num_inference_steps=30, width=width * 4, height = height * 4).images[0]
    image_lora_2.save(os.path.join(output_dir, "lora_stage2.png"))

I modified the value for if mps is available between 'mps' and 'cpu' manually for this test

Logs

No response

System Info

diffusers version: 0.27.2
Platform: macOS-14.4.1-arm64-arm-64bit
Python version: 3.10.14
PyTorch version (GPU?): 2.4.0.dev20240421 (False)
Huggingface_hub version: 0.22.2
Transformers version: 4.40.0.dev0
Accelerate version: 0.26.1

Who can help?

No response

yiyixuxu commented 6 months ago

cc @sayakpaul

sayakpaul commented 6 months ago

Thanks for reporting. I think it's better to keep this issue open rather than take immediate action. More so because we continue to see very little usage for DeepFloyd IF (likely because of the restricted license) :(

bghira commented 6 months ago

i think this problem is just exhibited very strongly in DeepFloyd samples and most likely impacts every single model on MPS (with varying levels of impact)

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / diffusers

[mps] training / inferencing deepfloyd must be done in float32 #7789

Describe the bug

Stage I (400M)

CPU (float32)

MPS (float32)

Stage II (450M)

CPU (float32)

MPS (float32)

Reproduction

Logs

System Info

Who can help?