huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.49k stars 5.28k forks source link

RuntimeError: torch._dynamo.optimize is called on a non function object. #2775

Closed xiaosaxuexige closed 1 year ago

xiaosaxuexige commented 1 year ago

Describe the bug

When I am trying to run the example given by the authors, which is on the pokemon dataset. I just follows the instruction on huggingface and never revise the code. but I got this error:

RuntimeError:

torch._dynamo.optimize is called on a non function object. If this is a callable class, please wrap the relevant code into a function and optimize the wrapper function.

Reproduction

export MODEL_NAME="CompVis/stable-diffusion-v1-4" export dataset_name="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$dataset_name \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --gradient_checkpointing \ --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model"

Logs

03/22/2023 14:32:42 - INFO - __main__ - ***** Running training *****
03/22/2023 14:32:42 - INFO - __main__ -   Num examples = 833
03/22/2023 14:32:42 - INFO - __main__ -   Num Epochs = 72
03/22/2023 14:32:42 - INFO - __main__ -   Instantaneous batch size per device = 1
03/22/2023 14:32:42 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 4
03/22/2023 14:32:42 - INFO - __main__ -   Gradient Accumulation steps = 4
03/22/2023 14:32:42 - INFO - __main__ -   Total optimization steps = 15000
Steps:   0%|                                                                                                                                  | 0/15000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train_text_to_image.py", line 789, in <module>
    main()
  File "train_text_to_image.py", line 730, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/wangfuling/wangfuling/ENTER/envs/dfs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/wangfuling/wangfuling/ENTER/envs/dfs/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward
    return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
  File "/home/wangfuling/wangfuling/ENTER/envs/dfs/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 229, in __call__
    raise RuntimeError(
RuntimeError: 

torch._dynamo.optimize is called on a non function object.
If this is a callable class, please wrap the relevant code into a function and optimize the
wrapper function.

>> class CallableClass:
>>     def __init__(self):
>>         super().__init__()
>>         self.relu = torch.nn.ReLU()
>>
>>     def __call__(self, x):
>>         return self.relu(torch.sin(x))
>>
>>     def print_hello(self):
>>         print("Hello world")
>>
>> mod = CallableClass()

If you want to optimize the __call__ function and other code, wrap that up in a function

>> def wrapper_fn(x):
>>     y = mod(x)
>>     return y.sum()

and then optimize the wrapper_fn

>> opt_wrapper_fn = torch._dynamo.optimize(wrapper_fn)

System Info

patrickvonplaten commented 1 year ago

@sayakpaul do you maybe have some time to look into this?

sayakpaul commented 1 year ago

@muellerzr @pacman100 does this error sound familiar? If so, do you have any recommendations?

muellerzr commented 1 year ago

CC @sgugger

sgugger commented 1 year ago

I haven't tried torch.compile on unets, so nothing to contribute here (except that this issue has been opened in Transformers once too)

sayakpaul commented 1 year ago

Our training scripts do not use this torch.compile(). On the other hand, accelerate doesn't set it by default either. So, on the surface, it does seem a bit weird to me.

patrickvonplaten commented 1 year ago

Can we try to reproduce the error somehow? @xiaosaxuexige do you think you could try to reproduce the error in a google colab maybe?

zhongshsh commented 1 year ago

I had the same problem.

rongxiaoqu commented 1 year ago

same problem...

sayakpaul commented 1 year ago

Could we get a minimally reproducible code snippet (preferably a Google Colab)?

Happenmass commented 1 year ago

same problem too......

patrickvonplaten commented 1 year ago

@sayakpaul when running this:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="fp16" train_text_to_image.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$dataset_name
--use_ema
--resolution=512 --center_crop --random_flip
--train_batch_size=1
--gradient_accumulation_steps=4
--gradient_checkpointing
--max_train_steps=15000
--learning_rate=1e-05
--max_grad_norm=1
--lr_scheduler="constant" --lr_warmup_steps=0
--output_dir="sd-pokemon-model"

with Torch 2.0 on GPU can you reproduce the error?

fecet commented 1 year ago

I have same problem with:

import torch
from accelerate import Accelerator

def prepare_model(model):
    accelerator = Accelerator()
    model = accelerator.prepare(model)
    return model

model = torch.nn.Linear(10, 2)

model = prepare_model(model)

inputs = torch.randn(4, 10)
outputs = model(inputs)
print(outputs.shape)

maybe it's a problem caused by accelerate/torch?

Edit: remove my accelerate config resolved this. I guess there is something wrong with accelerate's compile behaviour

sayakpaul commented 1 year ago

Pinging @muellerzr one more time (sorry) to check if this is a known issue.

sgugger commented 1 year ago

What is your accelerate configuration? And what is your setup? This script doesn't work on my side since the inputs are on the CPU but the model is moved on the GPU by Accelerate. Putting the inputs on the GPU makes it work without any issue.

pacman100 commented 1 year ago

I have same problem with:

import torch
from accelerate import Accelerator

def prepare_model(model):
    accelerator = Accelerator()
    model = accelerator.prepare(model)
    return model

model = torch.nn.Linear(10, 2)

model = prepare_model(model)

inputs = torch.randn(4, 10)
outputs = model(inputs)
print(outputs.shape)

maybe it's a problem caused by accelerate/torch?

Edit: remove my accelerate config resolved this. I guess there is something wrong with accelerate's compile behaviour

Okay, with this I was able to reproduce the issue when using mixed-precision fp16 in torch 2.0

- `Accelerate` version: 0.18.0.dev0
- Platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
- Python version: 3.10.4
- Numpy version: 1.23.5
- PyTorch version (GPU?): 2.0.0 (True)
- `Accelerate` default config:
    - compute_environment: LOCAL_MACHINE
    - distributed_type: NO
    - mixed_precision: fp16
    - use_cpu: False
    - num_processes: 1
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: all
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
    - dynamo_config: {'dynamo_backend': 'INDUCTOR'}

It is most likely due to ConvertOutputsToFp32

muellerzr commented 1 year ago

Hi! This should be fixed on Accelerate main, please try now by installing with: pip install git+https://github.com/huggingface/accelerate

sayakpaul commented 1 year ago

Hi @xiaosaxuexige, @Happenmass @fecet

could you try again after installing accelerate from the source and see if the issue still persists?

JiaojiaoYe1994 commented 1 year ago

Hi! This should be fixed on Accelerate main, please try now by installing with: pip install git+https://github.com/huggingface/accelerate

After install the newst version, it returns the new error.

CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp4cw1emgp/main.c', '-O3', '-I/usr/local/cuda/include', '-I/usr/include/python3.8', '-I/tmp/tmp4cw1emgp', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp4cw1emgp/triton_.cpython-38-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.

sayakpaul commented 1 year ago

I managed to find some time to run the example myself. I didn't run into any problems.

Here's my dev env:

- `diffusers` version: 0.16.0.dev0
- Platform: Linux-4.19.0-23-cloud-amd64-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 2.0.0+cu118 (True)
- Huggingface_hub version: 0.13.4
- Transformers version: 4.28.1
- Accelerate version: 0.19.0.dev0
- xFormers version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No

Using the following command:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions"

accelerate launch --mixed_precision="fp16" train_text_to_image.py \
    --pretrained_model_name_or_path=$MODEL_NAME \
    --dataset_name=$dataset_name \
    --use_ema \
    --resolution=512 --center_crop --random_flip \
    --train_batch_size=1 \
    --gradient_accumulation_steps=4 \
    --gradient_checkpointing \
    --max_train_steps=15000 \
    --learning_rate=1e-05 \
    --max_grad_norm=1 \
    --lr_scheduler="constant" --lr_warmup_steps=0 \
    --output_dir="sd-pokemon-model"
jzhang38 commented 1 year ago

I experience with this error:

File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run and self.step() File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 561, in step self.output.compile_subgraph( File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 541, in compile_subgraph self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root) File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper r = func(*args, **kwargs) File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e) from e torch._dynamo.exc.BackendCompilerFailed: compile_fn raised ImportError: /tmp/torchinductorpeiyuan/triton/1/dbc614e42da447cec3e4c2def72fd6aa/triton.so: failed to map segment from shared object

The script I run:

export MODEL_NAME="runwayml/stable-diffusion-v1-5" export dataset_name="lambdalabs/pokemon-blip-captions"

CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch --mixed_precision="bf16" --multi_gpu train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$dataset_name \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=4 \ --gradient_accumulation_steps=1 \ --gradient_checkpointing \ --max_train_steps=15000 --learning_rate=1e-05 \ --max_grad_norm=1 \ --enable_xformers_memory_efficient_attention \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model" --use_8bit_adam

I am using torch 2.0, diffusers 0.17.0.dev0, and accelerate 0.19.0.dev0. The cuda version is 11.6.

sayakpaul commented 1 year ago

For PyTorch 2.0 to work, you need CUDA 11.7.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten commented 1 year ago

cc @pcuenca for MAC maybe

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten commented 1 year ago

Gentle ping @pcuenca

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

pcuenca commented 1 year ago

@patrickvonplaten Sorry for being so late in checking this out, but I'm not sure about the Mac comment. Training is not really tested/supported on Mac at this point, and all reports in this issue were about Linux and cuda. I think the initial problem was resolved as pointed out by @muellerzr , so closing this for now. Feel free to reopen if needed :)