environment for training

qinghew commented 1 year ago

Because the author set up the environment both in the cog.yaml file and in the README, and there are gaps in it. Since xformers requires torch version 2.0 or higher, the torch version in cog.yaml is not available. Here is the environment setup for training (no problem for inference), conda install cudatoolkit=11.6 pip install torch==2.0.1 torchvision torchaudio pip install scipy==1.9.3 transformers==4.29.2 accelerate==0.19.0 clip==0.2.0 diffusers==0.16.1 xformers triton gradio datasets evaluate

but get this error: [2023-07-22 13:08:03,869] torch._dynamo.symbolic_convert: [INFO] Step 1: torchdynamo start tracing forward Traceback (most recent call last): File "/home/qinghewang/codes/multi_subject/fastcomposer/fastcomposer-main/fastcomposer/train.py", line 456, in train() File "/home/qinghewang/codes/multi_subject/fastcomposer/fastcomposer-main/fastcomposer/train.py", line 357, in train return_dict = model(batch, noise_scheduler) # batch["pixel_values"].shape torch.Size([16, 3, 512, 512]) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 82, in forward return self.dynamo_ctx(self._orig_mod.forward)(*args, *kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 209, in _fn return fn(args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/accelerate/utils/operations.py", line 521, in forward return model_forward(*args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/accelerate/utils/operations.py", line 509, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast return func(args, kwargs) File "/home/qinghewang/codes/multi_subject/fastcomposer/fastcomposer-main/fastcomposer/model.py", line 504, in forward vae_dtype = self.vae.parameters().next().dtype File "/home/qinghewang/codes/multi_subject/fastcomposer/fastcomposer-main/fastcomposer/model.py", line 507, in latents = self.vae.encode(vae_input).latent_dist.sample() File "/home/qinghewang/codes/multi_subject/fastcomposer/fastcomposer-main/fastcomposer/model.py", line 537, in encoder_hidden_states = self.postfuse_module( File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 337, in catch_errors return callback(frame, cache_size, hooks) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 404, in _convert_frame result = inner_convert(frame, cache_size, hooks) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 104, in _fn return fn(*args, *kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 262, in _convert_frame_assert return _compile( File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 163, in time_wrapper r = func(args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 324, in _compile out_code = transform_code_object(code, transform) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 445, in transform_code_object transformations(instructions, code_options) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 311, in transform tracer.run() File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1726, in run super().run() File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 576, in run and self.step() File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 540, in step getattr(self, inst.opname)(inst) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 342, in wrapper return inner_fn(self, inst) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 965, in CALL_FUNCTION self.call_function(fn, args, {}) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 474, in call_function self.push(fn.call_function(self, args, kwargs)) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 259, in call_function return super().call_function(tx, args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 92, in call_function return tx.inline_user_function_return( File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 510, in inline_user_function_return result = InliningInstructionTranslator.inline_call(self, fn, args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1806, in inline_call return cls.inlinecall(parent, func, args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1834, in inlinecall sub_locals, closure_cells = func.bind_args(parent, args, kwargs) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 159, in bind_args [ File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 160, in wrap(val=arg, source=source) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 59, in wrap_bound_arg assert isinstance(val, VariableTracker), typestr(val) AssertionError: builtin_function_or_method

from user code: File "/home/qinghewang/codes/multi_subject/fastcomposer/fastcomposer-main/fastcomposer/model.py", line 185, in forward text_object_embeds = fuse_object_embeddings(

Set torch._dynamo.config.verbose=True for more information

You can suppress this exception and fall back to eager by setting: torch._dynamo.config.suppress_errors = True

Global step: 0: 0%| | 0/150000 [00:50<?, ?it/s] Traceback (most recent call last): File "/home/qinghewang/anaconda3/envs/fastcomposer/bin/accelerate", line 8, in sys.exit(main()) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main args.func(args) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/accelerate/commands/launch.py", line 918, in launch_command simple_launcher(args) File "/home/qinghewang/anaconda3/envs/fastcomposer/lib/python3.10/site-packages/accelerate/commands/launch.py", line 580, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/home/qinghewang/anaconda3/envs/fastcomposer/bin/python', 'fastcomposer/train.py', '--pretrained_model_name_or_path', 'runwayml/stable-diffusion-v1-5', '--dataset_name', '/home/qinghewang/codes/multi_subject/fastcomposer/ffhq_wild_files', '--logging_dir', 'logs/stable-diffusion-v1-5/ffhq/postfuse-localize-ffhq-1_5-1e-5', '--output_dir', 'models/stable-diffusion-v1-5/ffhq/postfuse-localize-ffhq-1_5-1e-5', '--max_train_steps', '150000', '--num_train_epochs', '150000', '--train_batch_size', '16', '--learning_rate', '1e-5', '--unet_lr_scale', '1.0', '--checkpointing_steps', '200', '--mixed_precision', 'bf16', '--allow_tf32', '--keep_only_last_checkpoint', '--keep_interval', '10000', '--seed', '42', '--image_encoder_type', 'clip', '--image_encoder_name_or_path', 'openai/clip-vit-large-patch14', '--num_image_tokens', '1', '--max_num_objects', '4', '--train_resolution', '512', '--object_resolution', '224', '--text_image_linking', 'postfuse', '--object_appear_prob', '0.9', '--uncondition_prob', '0.1', '--object_background_processor', 'random', '--disable_flashattention', '--train_image_encoder', '--image_encoder_trainable_layers', '2', '--object_types', 'person', '--mask_loss', '--mask_loss_prob', '0.5', '--object_localization', '--object_localization_weight', '1e-3', '--object_localization_loss', 'balanced_l1', '--resume_from_checkpoint', 'latest', '--report_to', 'wandb']' returned non-zero exit status 1.

tianweiy commented 1 year ago

are you using torch.compile or something like this? What changes if any did you add? Thanks

qinghew commented 1 year ago

No change. torch.compile seems to run automatically. I have solved this problem by add these to train.py: import torch._dynamo.config torch._dynamo.config.suppress_errors = True

This two lines will suppress the error, but i'm not sure if this causes a performance drop.

tianweiy commented 1 year ago

I see. we trained our model before torch 2.0 comes out. We will check it later. Thanks

qinghew commented 1 year ago

Thanks. I try to use the version of torch in cog.yaml and pip install xformers==0.0.16 which is available.

mit-han-lab / fastcomposer

environment for training #14