多卡训练报错 - Githubissues

KaiGod0730 commented 4 months ago

感谢您的工作！我现在使用单卡训练没有问题，使用多卡训练会出现如下报错： Traceback (most recent call last): File "train_svd.py", line 1264, in main() File "train_svd.py", line 1045, in main added_time_ids = _get_add_time_ids( File "train_svd.py", line 949, in _get_add_time_ids passed_add_embed_dim = unet.config.addition_time_embed_dim * \ File "/.pt2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'DistributedDataParallel' object has no attribute 'config'

我使用的命令： accelerate launch train_svd.py \ --pretrained_model_name_or_path=stable-video-diffusion-img2vid-xt-1-1 \ --per_gpu_batch_size=1 --gradient_accumulation_steps=1 \ --max_train_steps=100 \ --width=512 \ --height=320 \ --checkpointing_steps=50 --checkpoints_total_limit=1 \ --learning_rate=1e-5 --lr_warmup_steps=0 \ --seed=123 \ --mixed_precision="fp16" \ --validation_steps=20 \ --num_workers=0 \

howardgriffin commented 1 month ago

Same error, how to solve the problem?

LTT-O commented 1 month ago

unet.config.addition_time_embed_dim加个module，unet.module.config.addition_time_embed_dim

pixeli99 / SVD_Xtend

多卡训练报错 #42