感谢您的工作!
我现在使用单卡训练没有问题,使用多卡训练会出现如下报错:
Traceback (most recent call last):
File "train_svd.py", line 1264, in
main()
File "train_svd.py", line 1045, in main
added_time_ids = _get_add_time_ids(
File "train_svd.py", line 949, in _get_add_time_ids
passed_add_embed_dim = unet.config.addition_time_embed_dim * \
File "/.pt2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'config'
感谢您的工作! 我现在使用单卡训练没有问题,使用多卡训练会出现如下报错: Traceback (most recent call last): File "train_svd.py", line 1264, in
main()
File "train_svd.py", line 1045, in main
added_time_ids = _get_add_time_ids(
File "train_svd.py", line 949, in _get_add_time_ids
passed_add_embed_dim = unet.config.addition_time_embed_dim * \
File "/.pt2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'config'
我使用的命令: accelerate launch train_svd.py \ --pretrained_model_name_or_path=stable-video-diffusion-img2vid-xt-1-1 \ --per_gpu_batch_size=1 --gradient_accumulation_steps=1 \ --max_train_steps=100 \ --width=512 \ --height=320 \ --checkpointing_steps=50 --checkpoints_total_limit=1 \ --learning_rate=1e-5 --lr_warmup_steps=0 \ --seed=123 \ --mixed_precision="fp16" \ --validation_steps=20 \ --num_workers=0 \