showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.15k stars 376 forks source link

I get an error when I try to learn Dreambooth model #10

Closed shi3z closed 1 year ago

shi3z commented 1 year ago

Thank you for a great job. I'm having trouble training a normal model, but I'm having trouble training a Dreambooth model. Mr Potato doesn't work either, so I'd like to identify the cause.


$ accelerate launch train_tuneavideo.py --config="configs/mr-potato-head.yaml" A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 02/01/2023 10:24:30 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: fp16

{'variance_type', 'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values. {'use_linear_projection', 'resnet_time_scale_shift', 'num_class_embeds', 'class_embed_type', 'mid_block_type', 'only_cross_attention', 'dual_cross_attention', 'upcast_attention'} was not found in config. Values will be initialized to default values. {'prediction_type', 'clip_sample'} was not found in config. Values will be initialized to default values. /home/ubuntu/Tune-A-Video/tuneavideo/pipelines/pipeline_tuneavideo.py:82: FutureWarning: The configuration file of this scheduler: DDIMScheduler { "_class_name": "DDIMScheduler", "_diffusers_version": "0.12.1", "beta_end": 0.012, "beta_schedule": "scaled_linear", "beta_start": 0.00085, "clip_sample": true, "num_train_timesteps": 1000, "prediction_type": "epsilon", "set_alpha_to_one": false, "skip_prk_steps": true, "steps_offset": 1, "trained_betas": null } has not set the configuration clip_sample. clip_sample should be set to False in the configuration file. Please make sure to update the config accordingly as not setting clip_sample in the config might lead to incorrect results in future versions. If you have downloaded this checkpoint from the Hugging Face Hub, it would be very nice if you could open a Pull request for the scheduler/scheduler_config.json file deprecate("clip_sample not set", "1.0.0", deprecation_message, standard_warn=False) 02/01/2023 10:24:42 - INFO - main - Running training 02/01/2023 10:24:42 - INFO - main - Num examples = 1 02/01/2023 10:24:42 - INFO - main - Num Epochs = 500 02/01/2023 10:24:42 - INFO - main - Instantaneous batch size per device = 1 02/01/2023 10:24:42 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1 02/01/2023 10:24:42 - INFO - main - Gradient Accumulation steps = 1 02/01/2023 10:24:42 - INFO - main - Total optimization steps = 500 Steps: 0%| | 0/500 [00:00<?, ?it/s]/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") Traceback (most recent call last): File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in main(OmegaConf.load(args.config)) File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 284, in main model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast return func(*args, kwargs) File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 364, in forward sample, res_samples = downsample_block( File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 301, in forward hidden_states = torch.utils.checkpoint.checkpoint( File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(args) File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 294, in custom_forward return module(inputs, return_dict=return_dict) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 111, in forward hidden_states = block( File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 243, in forward hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask, video_length=video_length) + hidden_states File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/ubuntu/Tune-A-Video/tuneavideo/models/attention.py", line 283, in forward query = self.reshape_heads_to_batch_dim(query) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'SparseCausalAttention' object has no attribute 'reshape_heads_to_batch_dim'


This environment is as follows

Tesla V100(32GB) Python 3.10.9 torch 1.13.1 torchaudio 0.13.1 torchtext 0.14.1 torchvision 0.14.1 transformers 4.26.0

Also, when I tried to train Tune-A-Video with a model I trained myself using the Diffusers examples, I got a different error.

$ accelerate launch train_tuneavideo.py --config="configs/man-surfing.yaml" A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' 02/01/2023 10:07:58 - INFO - main - Distributed environment: NO Num processes: 1 Process index: 0 Local process index: 0 Device: cuda Mixed precision type: fp16

{'variance_type'} was not found in config. Values will be initialized to default values. Traceback (most recent call last): File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 352, in main(OmegaConf.load(args.config)) File "/home/ubuntu/Tune-A-Video/train_tuneavideo.py", line 107, in main unet = UNet3DConditionModel.from_pretrained_2d(pretrained_model_path, subfolder="unet") File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 440, in from_pretrained_2d model = cls.from_config(config) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 210, in from_config model = cls(init_dict) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 567, in inner_init init(self, *args, **init_kwargs) File "/home/ubuntu/Tune-A-Video/tuneavideo/models/unet.py", line 158, in init raise ValueError(f"unknown mid_block_type : {mid_block_type}") ValueError: unknown mid_block_type : UNetMidBlock2DCrossAttn Traceback (most recent call last): File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/accelerate", line 8, in sys.exit(main()) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command simple_launcher(args) File "/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/ubuntu/.pyenv/versions/anaconda3-2022.05/envs/ldm310/bin/python', 'train_tuneavideo.py', '--config=configs/man-surfing.yaml']' returned non-zero exit status 1.


Any hint would be appreciated

shi3z commented 1 year ago

solved I found the cause, so I'll write it in case someone else has the same symptoms.

I was using Pytorch version 1.13.0 when it failed. 1.12 did not have this problem. However, on V100, the CUDA memory is exceeded, so the sample step was dropped from 8 to 4.

zhangjiewu commented 1 year ago

solved I found the cause, so I'll write it in case someone else has the same symptoms.

I was using Pytorch version 1.13.0 when it failed. 1.12 did not have this problem. However, on V100, the CUDA memory is exceeded, so the sample step was dropped from 8 to 4.

Hi @shi3z, thank you for sharing the solution. The code works well on my V100 32GB with xformers and fp16 enabled.

Randle-Github commented 1 year ago

Hi @shi3z , I still got the same problem even I completely follow your package settings as follows:

Traceback (most recent call last): File "/data/home/lyc/Tune-A-Video/train_tuneavideo.py", line 367, in main(OmegaConf.load(args.config)) File "/data/home/lyc/Tune-A-Video/train_tuneavideo.py", line 289, in main model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/accelerate/utils/operations.py", line 489, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast return func(*args, kwargs) File "/data/home/lyc/Tune-A-Video/tuneavideo/models/unet.py", line 364, in forward sample, res_samples = downsample_block( File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/data/home/lyc/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 301, in forward hidden_states = torch.utils.checkpoint.checkpoint( ] File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 235, in checkpoint return CheckpointFunction.apply(function, preserve, args) 3 File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 96, in forward outputs = run_function(args) File "/data/home/lyc/Tune-A-Video/tuneavideo/models/unet_blocks.py", line 294, in custom_forward return module(inputs, return_dict=return_dict) File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/data/home/lyc/Tune-A-Video/tuneavideo/models/attention.py", line 111, in forward hidden_states = block( File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/data/home/lyc/Tune-A-Video/tuneavideo/models/attention.py", line 243, in forward hidden_states = self.attn1(norm_hidden_states, attention_mask=attention_mask, video_length=video_length) + hidden_states File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/data/home/lyc/Tune-A-Video/tuneavideo/models/attention.py", line 283, in forward query = self.reshape_heads_to_batch_dim(query) File "/data/home/lyc/ENTER/envs/tune-a-video/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1207, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'SparseCausalAttention' object has no attribute 'reshape_heads_to_batch_dim'

Did you change other settings in your work?