Closed xiaosaxuexige closed 1 year ago
@sayakpaul do you maybe have some time to look into this?
@muellerzr @pacman100 does this error sound familiar? If so, do you have any recommendations?
CC @sgugger
I haven't tried torch.compile on unets, so nothing to contribute here (except that this issue has been opened in Transformers once too)
Our training scripts do not use this torch.compile(). On the other hand, accelerate doesn't set it by default either. So, on the surface, it does seem a bit weird to me.
Can we try to reproduce the error somehow? @xiaosaxuexige do you think you could try to reproduce the error in a google colab maybe?
I had the same problem.
same problem...
Could we get a minimally reproducible code snippet (preferably a Google Colab)?
same problem too......
@sayakpaul when running this:
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$dataset_name
--use_ema
--resolution=512 --center_crop --random_flip
--train_batch_size=1
--gradient_accumulation_steps=4
--gradient_checkpointing
--max_train_steps=15000
--learning_rate=1e-05
--max_grad_norm=1
--lr_scheduler="constant" --lr_warmup_steps=0
--output_dir="sd-pokemon-model"
with Torch 2.0 on GPU can you reproduce the error?
I have same problem with:
import torch
from accelerate import Accelerator
def prepare_model(model):
accelerator = Accelerator()
model = accelerator.prepare(model)
return model
model = torch.nn.Linear(10, 2)
model = prepare_model(model)
inputs = torch.randn(4, 10)
outputs = model(inputs)
print(outputs.shape)
maybe it's a problem caused by accelerate/torch?
Edit: remove my accelerate config resolved this. I guess there is something wrong with accelerate's compile behaviour
Pinging @muellerzr one more time (sorry) to check if this is a known issue.
What is your accelerate configuration? And what is your setup? This script doesn't work on my side since the inputs are on the CPU but the model is moved on the GPU by Accelerate. Putting the inputs on the GPU makes it work without any issue.
I have same problem with:
import torch from accelerate import Accelerator def prepare_model(model): accelerator = Accelerator() model = accelerator.prepare(model) return model model = torch.nn.Linear(10, 2) model = prepare_model(model) inputs = torch.randn(4, 10) outputs = model(inputs) print(outputs.shape)
maybe it's a problem caused by accelerate/torch?
Edit: remove my accelerate config resolved this. I guess there is something wrong with accelerate's compile behaviour
Okay, with this I was able to reproduce the issue when using mixed-precision fp16 in torch 2.0
- `Accelerate` version: 0.18.0.dev0
- Platform: Linux-5.4.0-125-generic-x86_64-with-glibc2.31
- Python version: 3.10.4
- Numpy version: 1.23.5
- PyTorch version (GPU?): 2.0.0 (True)
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: fp16
- use_cpu: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- dynamo_config: {'dynamo_backend': 'INDUCTOR'}
It is most likely due to ConvertOutputsToFp32
Hi! This should be fixed on Accelerate main, please try now by installing with: pip install git+https://github.com/huggingface/accelerate
Hi @xiaosaxuexige, @Happenmass @fecet
could you try again after installing accelerate
from the source and see if the issue still persists?
Hi! This should be fixed on Accelerate main, please try now by installing with:
pip install git+https://github.com/huggingface/accelerate
After install the newst version, it returns the new error.
CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp4cw1emgp/main.c', '-O3', '-I/usr/local/cuda/include', '-I/usr/include/python3.8', '-I/tmp/tmp4cw1emgp', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmp4cw1emgp/triton_.cpython-38-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.
I managed to find some time to run the example myself. I didn't run into any problems.
Here's my dev env:
- `diffusers` version: 0.16.0.dev0
- Platform: Linux-4.19.0-23-cloud-amd64-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 2.0.0+cu118 (True)
- Huggingface_hub version: 0.13.4
- Transformers version: 4.28.1
- Accelerate version: 0.19.0.dev0
- xFormers version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Using the following command:
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export dataset_name="lambdalabs/pokemon-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$dataset_name \
--use_ema \
--resolution=512 --center_crop --random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model"
I experience with this error:
File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run and self.step() File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 561, in step self.output.compile_subgraph( File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 541, in compile_subgraph self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root) File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 588, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper r = func(*args, **kwargs) File "/home/peiyuan/miniconda3/envs/diffusers/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 675, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e) from e torch._dynamo.exc.BackendCompilerFailed: compile_fn raised ImportError: /tmp/torchinductorpeiyuan/triton/1/dbc614e42da447cec3e4c2def72fd6aa/triton.so: failed to map segment from shared object
The script I run:
export MODEL_NAME="runwayml/stable-diffusion-v1-5" export dataset_name="lambdalabs/pokemon-blip-captions"
CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch --mixed_precision="bf16" --multi_gpu train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$dataset_name \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=4 \ --gradient_accumulation_steps=1 \ --gradient_checkpointing \ --max_train_steps=15000 --learning_rate=1e-05 \ --max_grad_norm=1 \ --enable_xformers_memory_efficient_attention \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model" --use_8bit_adam
I am using torch 2.0, diffusers 0.17.0.dev0, and accelerate 0.19.0.dev0. The cuda version is 11.6.
For PyTorch 2.0 to work, you need CUDA 11.7.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
cc @pcuenca for MAC maybe
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Gentle ping @pcuenca
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@patrickvonplaten Sorry for being so late in checking this out, but I'm not sure about the Mac comment. Training is not really tested/supported on Mac at this point, and all reports in this issue were about Linux and cuda. I think the initial problem was resolved as pointed out by @muellerzr , so closing this for now. Feel free to reopen if needed :)
Describe the bug
When I am trying to run the example given by the authors, which is on the pokemon dataset. I just follows the instruction on huggingface and never revise the code. but I got this error:
RuntimeError:
torch._dynamo.optimize is called on a non function object. If this is a callable class, please wrap the relevant code into a function and optimize the wrapper function.
Reproduction
export MODEL_NAME="CompVis/stable-diffusion-v1-4" export dataset_name="lambdalabs/pokemon-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --dataset_name=$dataset_name \ --use_ema \ --resolution=512 --center_crop --random_flip \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --gradient_checkpointing \ --max_train_steps=15000 \ --learning_rate=1e-05 \ --max_grad_norm=1 \ --lr_scheduler="constant" --lr_warmup_steps=0 \ --output_dir="sd-pokemon-model"
Logs
System Info
diffusers
version: 0.15.0.dev0