huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.03k stars 5.17k forks source link

Error in train_cm_ct_unconditional.py (Even after new commit) #8894

Open KetanMann opened 1 month ago

KetanMann commented 1 month ago

Replication @dg845

!accelerate launch --multi_gpu train_cm_ct_unconditional.py \
    --dataset_name="cifar10" \
    --dataset_image_column_name="img" \
    --output_dir="/path/to/output/dir" \
    --mixed_precision=fp16 \
    --resolution=32 \
    --max_train_steps=1000 \
    --max_train_samples=10000 \
    --dataloader_num_workers=8 \
    --noise_precond_type="cm" \
    --input_precond_type="cm" \
    --train_batch_size=4 \
    --learning_rate=1e-04 \
    --lr_scheduler="constant" \
    --lr_warmup_steps=0 \
    --use_8bit_adam \
    --use_ema \
    --validation_steps=100 \
    --eval_batch_size=4 \
    --checkpointing_steps=100 \
    --checkpoints_total_limit=10 \
    --class_conditional \
    --num_classes=10 

Traceback (most recent call last):

File "/kaggle/working/train_cm_ct_unconditional.py", line 1438, in main(args) File "/kaggle/working/train_cm_ct_unconditional.py", line 1324, in main teacher_unet.load_state_dict(unet.state_dict()) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict Traceback (most recent call last): File "/kaggle/working/train_cm_ct_unconditional.py", line 1438, in main(args) File "/kaggle/working/train_cm_ct_unconditional.py", line 1324, in main teacher_unet.load_state_dict(unet.state_dict()) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNet2DModel: Missing key(s) in state_dict: "conv_in.weight", "conv_in.bias", "time_embedding.linear_1.weight", "time_embedding.linear_1.bias", "time_embedding.linear_2.weight", "time_embedding.linear_2.bias", "class_embedding.weight", "down_blocks.0.resnets.0.norm1.weight", "down_blocks.0.resnets.0.norm1.bias", "down_blocks.0.resnets.0.conv1.weight", "down_blocks.0.resnets.0.conv1.bias", "down_blocks.0.resnets.0.time_emb_proj.weight", "down_blocks.0.resnets.0.time_emb_proj.bias", "down_blocks.0.resnets.0.norm2.weight", "down_blocks.0.resnets.0.norm2.bias", "down_blocks.0.resnets.0.conv2.weight", "down_blocks.0.resnets.0.conv2.bias", "down_blocks.0.resnets.1.norm1.weight", "down_blocks.0.resnets.1.norm1.bias", "down_blocks.0.resnets.1.conv1.weight", "down_blocks.0.resnets.1.conv1.bias", "down_blocks.0.resnets.1.time_emb_proj.weight", "down_blocks.0.resnets.1.time_emb_proj.bias", "down_blocks.0.resnets.1.norm2.weight", "down_blocks.0.resnets.1.norm2.bias", "down_blocks.0.resnets.1.conv2.weight", "down_blocks.0.resnets.1.conv2.bias", "down_blocks.0.downsamplers.0.conv.weight", "down_blocks.0.downsamplers.0.conv.bias", "down_blocks.1.resnets.0.norm1.weight", "down_blocks.1.resnets.0.norm1.bias", "down_blocks.1.resnets.0.conv1.weight", "down_blocks.1.resnets.0.conv1.bias", "down_blocks.1.resnets.0.time_emb_proj.weight", "down_blocks.1.resnets.0.time_emb_proj.bias", "down_blocks.1.resnets.0.norm2.weight", "down_blocks.1.resnets.0.norm2.bias", "down_blocks.1.resnets.0.conv2.weight", "down_blocks.1.resnets.0.conv2.bias", "down_blocks.1.resnets.1.norm1.weight", "down_blocks.1.resnets.1.norm1.bias", "down_blocks.1.resnets.1.conv1.weight", "down_blocks.1.resnets.1.conv1.bias", "down_blocks.1.resnets.1.time_emb_proj.weight", "down_blocks.1.resnets.1.time_emb_proj.bias", "down_blocks.1.resnets.1.norm2.weight", "down_blocks.1.resnets.1.norm2.bias", "down_blocks.1.resnets.1.conv2.weight", "down_blocks.1.resnets.1.conv2.bias", "down_blocks.1.downsamplers.0.conv.weight", "down_blocks.1.downsamplers.0.conv.bias", "down_blocks.2.resnets.0.norm1.weight", "down_blocks.2.resnets.0.norm1.bias", "down_blocks.2.resnets.0.conv1.weight", "down_blocks.2.resnets.0.conv1.bias", "down_blocks.2.resnets.0.time_emb_proj.weight", "down_blocks.2.resnets.0.time_emb_proj.bias", "down_blocks.2.resnets.0.norm2.weight", "down_blocks.2.resnets.0.norm2.bias", "down_blocks.2.resnets.0.conv2.weight", "down_blocks.2.resnets.0.conv2.bias", "down_blocks.2.resnets.0.conv_shortcut.weight", "down_blocks.2.resnets.0.conv_shortcut.bias", "down_blocks.2.resnets.1.norm1.weight", "down_blocks.2.resnets.1.norm1.bias", "down_blocks.2.resnets.1.conv1.weight", "down_blocks.2.resnets.1.conv1.bias", "down_blocks.2.resnets.1.time_emb_proj.weight", "down_blocks.2.resnets.1.time_emb_proj.bias", "down_blocks.2.resnets.1.norm2.weight", "down_blocks.2.resnets.1.norm2.bias", "down_blocks.2.resnets.1.conv2.weight", "down_blocks.2.resnets.1.conv2.bias", "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias", "down_blocks.3.resnets.0.norm1.weight", "down_blocks.3.resnets.0.norm1.bias", "down_blocks.3.resnets.0.conv1.weight", "down_blocks.3.resnets.0.conv1.bias", "down_blocks.3.resnets.0.time_emb_proj.weight", "down_blocks.3.resnets.0.time_emb_proj.bias", "down_blocks.3.resnets.0.norm2.weight", "down_blocks.3.resnets.0.norm2.bias", "down_blocks.3.resnets.0.conv2.weight", "down_blocks.3.resnets.0.conv2.bias", "down_blocks.3.resnets.1.norm1.weight", "down_blocks.3.resnets.1.norm1.bias", "down_blocks.3.resnets.1.conv1.weight", "down_blocks.3.resnets.1.conv1.bias", "down_blocks.3.resnets.1.time_emb_proj.weight", "down_blocks.3.resnets.1.time_emb_proj.bias", "down_blocks.3.resnets.1.norm2.weight", "down_blocks.3.resnets.1.norm2.bias", "down_blocks.3.resnets.1.conv2.weight", "down_blocks.3.resnets.1.conv2.bias", "down_blocks.3.downsamplers.0.conv.weight", "down_blocks.3.downsamplers.0.conv.bias", "down_blocks.4.attentions.0.group_norm.weight", "down_blocks.4.attentions.0.group_norm.bias", "down_blocks.4.attentions.0.to_q.weight", "down_blocks.4.attentions.0.to_q.bias", "down_blocks.4.attentions.0.to_k.weight", "down_blocks.4.attentions.0.to_k.bias", "down_blocks.4.attentions.0.to_v.weight", "down_blocks.4.attentions.0.to_v.bias", "down_blocks.4.attentions.0.to_out.0.weight", "down_blocks.4.attentions.0.to_out.0.bias", "down_blocks.4.attentions.1.group_norm.weight", "down_blocks.4.attentions.1.group_norm.bias", "down_blocks.4.attentions.1.to_q.weight", "down_blocks.4.attentions.1.to_q.bias", "down_blocks.4.attentions.1.to_k.weight", "down_blocks.4.attentions.1.to_k.bias", "down_blocks.4.attentions.1.to_v.weight", "down_blocks.4.attentions.1.to_v.bias", "down_blocks.4.attentions.1.to_out.0.weight", "down_blocks.4.attentions.1.to_out.0.bias", "down_blocks.4.resnets.0.norm1.weight", "down_blocks.4.resnets.0.norm1.bias", "down_blocks.4.resnets.0.conv1.weight", "down_blocks.4.resnets.0.conv1.bias", "down_blocks.4.resnets.0.time_emb_proj.weight", "down_blocks.4.resnets.0.time_emb_proj.bias", "down_blocks.4.resnets.0.norm2.weight", "down_blocks.4.resnets.0.norm2.bias", "down_blocks.4.resnets.0.conv2.weight", "down_blocks.4.resnets.0.conv2.bias", "down_blocks.4.resnets.0.conv_shortcut.weight", "down_blocks.4.resnets.0.conv_shortcut.bias", "down_blocks.4.resnets.1.norm1.weight", "down_blocks.4.resnets.1.norm1.bias", "down_blocks.4.resnets.1.conv1.weight", "down_blocks.4.resnets.1.conv1.bias", "down_blocks.4.resnets.1.time_emb_proj.weight", "down_blocks.4.resnets.1.time_emb_proj.bias", "down_blocks.4.resnets.1.norm2.weight", "down_blocks.4.resnets.1.norm2.bias", "down_blocks.4.resnets.1.conv2.weight", "down_blocks.4.resnets.1.conv2.bias", "down_blocks.4.downsamplers.0.conv.weight", "down_blocks.4.downsamplers.0.conv.bias", "down_blocks.5.resnets.0.norm1.weight", "down_blocks.5.resnets.0.norm1.bias", "down_blocks.5.resnets.0.conv1.weight", "down_blocks.5.resnets.0.conv1.bias", "down_blocks.5.resnets.0.time_emb_proj.weight", "down_blocks.5.resnets.0.time_emb_proj.bias", "down_blocks.5.resnets.0.norm2.weight", "down_blocks.5.resnets.0.norm2.bias", "down_blocks.5.resnets.0.conv2.weight", "down_blocks.5.resnets.0.conv2.bias", "down_blocks.5.resnets.1.norm1.weight", "down_blocks.5.resnets.1.norm1.bias", "down_blocks.5.resnets.1.conv1.weight", "down_blocks.5.resnets.1.conv1.bias", "down_blocks.5.resnets.1.time_emb_proj.weight", "down_blocks.5.resnets.1.time_emb_proj.bias", "down_blocks.5.resnets.1.norm2.weight", "down_blocks.5.resnets.1.norm2.bias", "down_blocks.5.resnets.1.conv2.weight", "down_blocks.5.resnets.1.conv2.bias", "up_blocks.0.resnets.0.norm1.weight", "up_blocks.0.resnets.0.norm1.bias", "up_blocks.0.resnets.0.conv1.weight", "up_blocks.0.resnets.0.conv1.bias", "up_blocks.0.resnets.0.time_emb_proj.weight", "up_blocks.0.resnets.0.time_emb_proj.bias", "up_blocks.0.resnets.0.norm2.weight", "up_blocks.0.resnets.0.norm2.bias", "up_blocks.0.resnets.0.conv2.weight", "up_blocks.0.resnets.0.conv2.bias", "up_blocks.0.resnets.0.conv_shortcut.weight", "up_blocks.0.resnets.0.conv_shortcut.bias", "up_blocks.0.resnets.1.norm1.weight", "up_blocks.0.resnets.1.norm1.bias", "up_blocks.0.resnets.1.conv1.weight", "up_blocks.0.resnets.1.conv1.bias", "up_blocks.0.resnets.1.time_emb_proj.weight", "up_blocks.0.resnets.1.time_emb_proj.bias", "up_blocks.0.resnets.1.norm2.weight", "up_blocks.0.resnets.1.norm2.bias", "up_blocks.0.resnets.1.conv2.weight", "up_blocks.0.resnets.1.conv2.bias", "up_blocks.0.resnets.1.conv_shortcut.weight", "up_blocks.0.resnets.1.conv_shortcut.bias", "up_blocks.0.resnets.2.norm1.weight", "up_blocks.0.resnets.2.norm1.bias", "up_blocks.0.resnets.2.conv1.weight", "up_blocks.0.resnets.2.conv1.bias", "up_blocks.0.resnets.2.time_emb_proj.weight", "up_blocks.0.resnets.2.time_emb_proj.bias", "up_blocks.0.resnets.2.norm2.weight", "up_blocks.0.resnets.2.norm2.bias", "up_blocks.0.resnets.2.conv2.weight", "up_blocks.0.resnets.2.conv2.bias", "up_blocks.0.resnets.2.conv_shortcut.weight", "up_blocks.0.resnets.2.conv_shortcut.bias", "up_blocks.0.upsamplers.0.conv.weight", "up_blocks.0.upsamplers.0.conv.bias", "up_blocks.1.attentions.0.group_norm.weight", "up_blocks.1.attentions.0.group_norm.bias", "up_blocks.1.attentions.0.to_q.weight", "up_blocks.1.attentions.0.to_q.bias", "up_blocks.1.attentions.0.to_k.weight", "up_blocks.1.attentions.0.to_k.bias", "up_blocks.1.attentions.0.to_v.weight", "up_blocks.1.attentions.0.to_v.bias", "up_blocks.1.attentions.0.to_out.0.weight", "up_blocks.1.attentions.0.to_out.0.bias", "up_blocks.1.attentions.1.group_norm.weight", "up_blocks.1.attentions.1.group_norm.bias", "up_blocks.1.attentions.1.to_q.weight", "up_blocks.1.attentions.1.to_q.bias", "up_blocks.1.attentions.1.to_k.weight", "up_blocks.1.attentions.1.to_k.bias", "up_blocks.1.attentions.1.to_v.weight", "up_blocks.1.attentions.1.to_v.bias", "up_blocks.1.attentions.1.to_out.0.weight", "up_blocks.1.attentions.1.to_out.0.bias", "up_blocks.1.attentions.2.group_norm.weight", "up_blocks.1.attentions.2.group_norm.bias", "up_blocks.1.attentions.2.to_q.weight", "up_blocks.1.attentions.2.to_q.bias", "up_blocks.1.attentions.2.to_k.weight", "up_blocks.1.attentions.2.to_k.bias", "up_blocks.1.attentions.2.to_v.weight", "up_blocks.1.attentions.2.to_v.bias", "up_blocks.1.attentions.2.to_out.0.weight", "up_blocks.1.attentions.2.to_out.0.bias", "up_blocks.1.resnets.0.norm1.weight", "up_blocks.1.resnets.0.norm1.bias", "up_blocks.1.resnets.0.conv1.weight", "up_blocks.1.resnets.0.conv1.bias", "up_blocks.1.resnets.0.time_emb_proj.weight", "up_blocks.1.resnets.0.time_emb_proj.bias", "up_blocks.1.resnets.0.norm2.weight", "up_blocks.1.resnets.0.norm2.bias", "up_blocks.1.resnets.0.conv2.weight", "up_blocks.1.resnets.0.conv2.bias", "up_blocks.1.resnets.0.conv_shortcut.weight", "up_blocks.1.resnets.0.conv_shortcut.bias", "up_blocks.1.resnets.1.norm1.weight", "up_blocks.1.resnets.1.norm1.bias", "up_blocks.1.resnets.1.conv1.weight", "up_blocks.1.resnets.1.conv1.bias", "up_blocks.1.resnets.1.time_emb_proj.weight", "up_blocks.1.resnets.1.time_emb_proj.bias", "up_blocks.1.resnets.1.norm2.weight", "up_blocks.1.resnets.1.norm2.bias", "up_blocks.1.resnets.1.conv2.weight", "up_blocks.1.resnets.1.conv2.bias", "up_blocks.1.resnets.1.conv_shortcut.weight", "up_blocks.1.resnets.1.conv_shortcut.bias", "up_blocks.1.resnets.2.norm1.weight", "up_blocks.1.resnets.2.norm1.bias", "up_blocks.1.resnets.2.conv1.weight", "up_blocks.1.resnets.2.conv1.bias", "up_blocks.1.resnets.2.time_emb_proj.weight", "up_blocks.1.resnets.2.time_emb_proj.bias", "up_blocks.1.resnets.2.norm2.weight", "up_blocks.1.resnets.2.norm2.bias", "up_blocks.1.resnets.2.conv2.weight", "up_blocks.1.resnets.2.conv2.bias", "up_blocks.1.resnets.2.conv_shortcut.weight", "up_blocks.1.resnets.2.conv_shortcut.bias", "up_blocks.1.upsamplers.0.conv.weight", "up_blocks.1.upsamplers.0.conv.bias", "up_blocks.2.resnets.0.norm1.weight", "up_blocks.2.resnets.0.norm1.bias", "up_blocks.2.resnets.0.conv1.weight", "up_blocks.2.resnets.0.conv1.bias", "up_blocks.2.resnets.0.time_emb_proj.weight", "up_blocks.2.resnets.0.time_emb_proj.bias", "up_blocks.2.resnets.0.norm2.weight", "up_blocks.2.resnets.0.norm2.bias", "up_blocks.2.resnets.0.conv2.weight", "up_blocks.2.resnets.0.conv2.bias", "up_blocks.2.resnets.0.conv_shortcut.weight", "up_blocks.2.resnets.0.conv_shortcut.bias", "up_blocks.2.resnets.1.norm1.weight", "up_blocks.2.resnets.1.norm1.bias", "up_blocks.2.resnets.1.conv1.weight", "up_blocks.2.resnets.1.conv1.bias", "up_blocks.2.resnets.1.time_emb_proj.weight", "up_blocks.2.resnets.1.time_emb_proj.bias", "up_blocks.2.resnets.1.norm2.weight", "up_blocks.2.resnets.1.norm2.bias", "up_blocks.2.resnets.1.conv2.weight", "up_blocks.2.resnets.1.conv2.bias", "up_blocks.2.resnets.1.conv_shortcut.weight", "up_blocks.2.resnets.1.conv_shortcut.bias", "up_blocks.2.resnets.2.norm1.weight", "up_blocks.2.resnets.2.norm1.bias", "up_blocks.2.resnets.2.conv1.weight", "up_blocks.2.resnets.2.conv1.bias", "up_blocks.2.resnets.2.time_emb_proj.weight", "up_blocks.2.resnets.2.time_emb_proj.bias", "up_blocks.2.resnets.2.norm2.weight", "up_blocks.2.resnets.2.norm2.bias", "up_blocks.2.resnets.2.conv2.weight", "up_blocks.2.resnets.2.conv2.bias", "up_blocks.2.resnets.2.conv_shortcut.weight", "up_blocks.2.resnets.2.conv_shortcut.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias", "up_blocks.3.resnets.0.norm1.weight", "up_blocks.3.resnets.0.norm1.bias", "up_blocks.3.resnets.0.conv1.weight", "up_blocks.3.resnets.0.conv1.bias", "up_blocks.3.resnets.0.time_emb_proj.weight", "up_blocks.3.resnets.0.time_emb_proj.bias", "up_blocks.3.resnets.0.norm2.weight", "up_blocks.3.resnets.0.norm2.bias", "up_blocks.3.resnets.0.conv2.weight", "up_blocks.3.resnets.0.conv2.bias", "up_blocks.3.resnets.0.conv_shortcut.weight", "up_blocks.3.resnets.0.conv_shortcut.bias", "up_blocks.3.resnets.1.norm1.weight", "up_blocks.3.resnets.1.norm1.bias", "up_blocks.3.resnets.1.conv1.weight", "up_blocks.3.resnets.1.conv1.bias", "up_blocks.3.resnets.1.time_emb_proj.weight", "up_blocks.3.resnets.1.time_emb_proj.bias", "up_blocks.3.resnets.1.norm2.weight", "up_blocks.3.resnets.1.norm2.bias", "up_blocks.3.resnets.1.conv2.weight", "up_blocks.3.resnets.1.conv2.bias", "up_blocks.3.resnets.1.conv_shortcut.weight", "up_blocks.3.resnets.1.conv_shortcut.bias", "up_blocks.3.resnets.2.norm1.weight", "up_blocks.3.resnets.2.norm1.bias", "up_blocks.3.resnets.2.conv1.weight", "up_blocks.3.resnets.2.conv1.bias", "up_blocks.3.resnets.2.time_emb_proj.weight", "up_blocks.3.resnets.2.time_emb_proj.bias", "up_blocks.3.resnets.2.norm2.weight", "up_blocks.3.resnets.2.norm2.bias", "up_blocks.3.resnets.2.conv2.weight", "up_blocks.3.resnets.2.conv2.bias", "up_blocks.3.resnets.2.conv_shortcut.weight", "up_blocks.3.resnets.2.conv_shortcut.bias", "up_blocks.3.upsamplers.0.conv.weight", "up_blocks.3.upsamplers.0.conv.bias", "up_blocks.4.resnets.0.norm1.weight", "up_blocks.4.resnets.0.norm1.bias", "up_blocks.4.resnets.0.conv1.weight", "up_blocks.4.resnets.0.conv1.bias", "up_blocks.4.resnets.0.time_emb_proj.weight", "up_blocks.4.resnets.0.time_emb_proj.bias", "up_blocks.4.resnets.0.norm2.weight", "up_blocks.4.resnets.0.norm2.bias", "up_blocks.4.resnets.0.conv2.weight", "up_blocks.4.resnets.0.conv2.bias", "up_blocks.4.resnets.0.conv_shortcut.weight", "up_blocks.4.resnets.0.conv_shortcut.bias", "up_blocks.4.resnets.1.norm1.weight", "up_blocks.4.resnets.1.norm1.bias", "up_blocks.4.resnets.1.conv1.weight", "up_blocks.4.resnets.1.conv1.bias", "up_blocks.4.resnets.1.time_emb_proj.weight", "up_blocks.4.resnets.1.time_emb_proj.bias", "up_blocks.4.resnets.1.norm2.weight", "up_blocks.4.resnets.1.norm2.bias", "up_blocks.4.resnets.1.conv2.weight", "up_blocks.4.resnets.1.conv2.bias", "up_blocks.4.resnets.1.conv_shortcut.weight", "up_blocks.4.resnets.1.conv_shortcut.bias", "up_blocks.4.resnets.2.norm1.weight", "up_blocks.4.resnets.2.norm1.bias", "up_blocks.4.resnets.2.conv1.weight", "up_blocks.4.resnets.2.conv1.bias", "up_blocks.4.resnets.2.time_emb_proj.weight", "up_blocks.4.resnets.2.time_emb_proj.bias", "up_blocks.4.resnets.2.norm2.weight", "up_blocks.4.resnets.2.norm2.bias", "up_blocks.4.resnets.2.conv2.weight", "up_blocks.4.resnets.2.conv2.bias", "up_blocks.4.resnets.2.conv_shortcut.weight", "up_blocks.4.resnets.2.conv_shortcut.bias", "up_blocks.4.upsamplers.0.conv.weight", "up_blocks.4.upsamplers.0.conv.bias", "up_blocks.5.resnets.0.norm1.weight", "up_blocks.5.resnets.0.norm1.bias", "up_blocks.5.resnets.0.conv1.weight", "up_blocks.5.resnets.0.conv1.bias", "up_blocks.5.resnets.0.time_emb_proj.weight", "up_blocks.5.resnets.0.time_emb_proj.bias", "up_blocks.5.resnets.0.norm2.weight", "up_blocks.5.resnets.0.norm2.bias", "up_blocks.5.resnets.0.conv2.weight", "up_blocks.5.resnets.0.conv2.bias", "up_blocks.5.resnets.0.conv_shortcut.weight", "up_blocks.5.resnets.0.conv_shortcut.bias", "up_blocks.5.resnets.1.norm1.weight", "up_blocks.5.resnets.1.norm1.bias", "up_blocks.5.resnets.1.conv1.weight", "up_blocks.5.resnets.1.conv1.bias", "up_blocks.5.resnets.1.time_emb_proj.weight", "up_blocks.5.resnets.1.time_emb_proj.bias", "up_blocks.5.resnets.1.norm2.weight", "up_blocks.5.resnets.1.norm2.bias", "up_blocks.5.resnets.1.conv2.weight", "up_blocks.5.resnets.1.conv2.bias", "up_blocks.5.resnets.1.conv_shortcut.weight", "up_blocks.5.resnets.1.conv_shortcut.bias", "up_blocks.5.resnets.2.norm1.weight", "up_blocks.5.resnets.2.norm1.bias", "up_blocks.5.resnets.2.conv1.weight", "up_blocks.5.resnets.2.conv1.bias", "up_blocks.5.resnets.2.time_emb_proj.weight", "up_blocks.5.resnets.2.time_emb_proj.bias", "up_blocks.5.resnets.2.norm2.weight", "up_blocks.5.resnets.2.norm2.bias", "up_blocks.5.resnets.2.conv2.weight", "up_blocks.5.resnets.2.conv2.bias", "up_blocks.5.resnets.2.conv_shortcut.weight", "up_blocks.5.resnets.2.conv_shortcut.bias", "mid_block.attentions.0.group_norm.weight", "mid_block.attentions.0.group_norm.bias", "mid_block.attentions.0.to_q.weight", "mid_block.attentions.0.to_q.bias", "mid_block.attentions.0.to_k.weight", "mid_block.attentions.0.to_k.bias", "mid_block.attentions.0.to_v.weight", "mid_block.attentions.0.to_v.bias", "mid_block.attentions.0.to_out.0.weight", "mid_block.attentions.0.to_out.0.bias", "mid_block.resnets.0.norm1.weight", "mid_block.resnets.0.norm1.bias", "mid_block.resnets.0.conv1.weight", "mid_block.resnets.0.conv1.bias", "mid_block.resnets.0.time_emb_proj.weight", "mid_block.resnets.0.time_emb_proj.bias", "mid_block.resnets.0.norm2.weight", "mid_block.resnets.0.norm2.bias", "mid_block.resnets.0.conv2.weight", "mid_block.resnets.0.conv2.bias", "mid_block.resnets.1.norm1.weight", "mid_block.resnets.1.norm1.bias", "mid_block.resnets.1.conv1.weight", "mid_block.resnets.1.conv1.bias", "mid_block.resnets.1.time_emb_proj.weight", "mid_block.resnets.1.time_emb_proj.bias", "mid_block.resnets.1.norm2.weight", "mid_block.resnets.1.norm2.bias", "mid_block.resnets.1.conv2.weight", "mid_block.resnets.1.conv2.bias", "conv_norm_out.weight", "conv_norm_out.bias", "conv_out.weight", "conv_out.bias". Unexpected key(s) in state_dict: "module.conv_in.weight", "module.conv_in.bias", ""module.down_blocks.2.resnets.0.norm1.weight", "module.down_blocks.2.resnets.0.norm1.bias", "module.down_blocks.2.resnets.0.conv1.weight", "module.down_blocks.2.resnets.0.conv1.bias", "module.do"module.down_blocks.4.resnets.0.norm1.weight", "module.down_blocks.4.resnets.0.norm1.bias", "module.down_blocks.4.resnets.0.conv1.weight", "module.down_blocks.4.resnets.0.conv1.bias", "module.down_blocks.4.resnets.0.time_emb_proj.weight", "module.down_blocks.4.resnets.0.time_emb_proj.bias", "module.down_blocks.4.resnets.0.norm2.weight", "module.down_blocks.4.resnets.0.norm2.bias", "module.down_blocks.4.resnets.0.conv2.weight", "module.down_blocks.4.resnets.0.conv2.bias", "module.down_blocks.4.resnets.0.conv_shortcut.weight", "module.down_blocks.4.resnets.0.conv_shortcut.bias", "module.down_blocks.4.resnets.1.norm1.weight", "module.down_blocks.4.resnets.1.norm1.bias", "module.down_blocks.4.resnets.1.conv1.weight", "module.down_blocks.4.resnets.1.conv1.bias", "module.down_blocks.4.resnets.1.time_emb_proj.weight", "module.down_blocks.4.resnets.1.time_emb_proj.bias", "module.down_blocks.4.resnets.1.norm2.weight", "module.down_blocks.4.resnets.1.norm2.bias", "module.down_blocks.4.resnets.1.conv2.weight", "module.down_blocks.4.resnets.1.conv2.bias", "module.down_blocks.4.downsamplers.0.conv.weight", "module.down_blocks.4.downsamplers.0.conv.bias", "module.down_blocks.5.resnets.0.norm1.weight", "module.down_blocks.5.resnets.0.norm1.bias", "module.down_blocks.5.resnets.0.conv1.weight", "module.down_blocks.5.resnets.0.conv1.bias", "module.down_blocks.5.resnets.0.time_emb_proj.weight", "module.down_blocks.5.resnets.0.time_emb_proj.bias", "module.down_blocks.5.resnets.0.norm2.weight", "module.down_blocks.5.resnets.0.norm2.bias", "module.down_blocks.5.resnets.0.conv2.weight", "module.down_blocks.5.resnets.0.conv2.bias", "module.down_blocks.5.resnets.1.norm1.weight", "module.down_blocks.5.resnets.1.norm1.bias", "module.down_blocks.5.resnets.1.conv1.weight", "module.down_blocks.5.resnets.1.conv1.bias", "module.down_blocks.5.resnets.1.time_emb_proj.weight", "module.down_blocks.5.resnets.1.time_emb_proj.bias", "module.down_blocks.5.resnets.1.norm2.weight", "module.down_blocks.5.resnets.1.norm2.bias", "module.down_blocks.5.resnets.1.conv2.weight", "module.down_blocks.5.resnets.1.conv2.bias", "module.up_blocks.0.resnets.0.norm1.weight", "module.up_blocks.0.resnets.0.norm1.bias", "module.up_blocks.0.resnets.0.conv1.weight", "module.up_blocks.0.resnets.0.conv1.bias", "module.up_blocks.0.resnets.0.time_emb_proj.weight", "module.up_blocks.0.resnets.0.time_emb_proj.bias", "module.up_blocks.0.resnets.0.norm2.weight", "module.up_blocks.0.resnets.0.norm2.bias", "module.up_blocks.0.resnets.0.conv2.weight", "module.up_blocks.0.resnets.0.conv2.bias", "module.up_blocks.0.resnets.0.conv_shortcut.weight", "module.up_blocks.0.resnets.0.conv_shortcut.bias", "module.up_blocks.0.resnets.1.norm1.weight", "module.up_blocks.0.resnets.1.norm1.bias", "module.up_blocks.0.resnets.1.conv1.weight", "module.up_blocks.0.resnets.1.conv1.bias", "module.up_blocks.0.resnets.1.time_emb_proj.weight", "module.up_blocks.0.resnets.1.time_emb_proj.bias", "module.up_blocks.0.resnets.1.norm2.weight", "module.up_blocks.0.resnets.1.norm2.bias", "module.up_blocks.0.resnets.1.conv2.weight", "module.up_blocks.0.resnets.1.conv2.bias", "module.up_blocks.0.resnets.1.conv_shortcut.weight", "module.up_blocks.0.resnets.1.conv_shortcut.bias", "module.up_blocks.0.resnets.2.norm1.weight", "module.up_blocks.0.resnets.2.norm1.bias", "module.up_blocks.0.resnets.2.conv1.weight", "module.up_blocks.0.resnets.2.conv1.bias", "module.up_blocks.0.resnets.2.time_emb_proj.weight", "module.up_blocks.0.resnets.2.time_emb_proj.bias", "module.up_blocks.0.resnets.2.norm2.weight", "module.up_blocks.0.resnets.2.norm2.bias", "module.up_blocks.0.resnets.2.conv2.weight", "module.up_blocks.0.resnets.2.conv2.bias", "module.up_blocks.0.resnets.2.conv_shortcut.weight", "module.up_blocks.0.resnets.2.conv_shortcut.bias", "module.up_blocks.0.upsamplers.0.conv.weight", "module.up_blocks.0.upsamplers.0.conv.bias", "module.up_blocks.1.attentions.0.group_norm.weight", "module.up_blocks.1.attentions.0.group_norm.bias", "module.up_blocks.1.attentions.0.to_q.weight", "module.up_blocks.1.attentions.0.to_q.bias", "module.up_blocks.1.attentions.0.to_k.weight", "module.up_blocks.1.attentions.0.to_k.bias", "module.up_blocks.1.attentions.0.to_v.weight", "module.up_blocks.1.attentions.0.to_v.bias", "module.up_blocks.1.attentions.0.to_out.0.weight", "module.up_blocks.1.attentions.0.to_out.0.bias", "module.up_blocks.1.attentions.1.group_norm.weight", "module.up_blocks.1.attentions.1.group_norm.bias", "module.up_blocks.1.attentions.1.to_q.weight", "module.up_blocks.1.attentions.1.to_q.bias", "module.up_blocks.1.attentions.1.to_k.weight", "module.up_blocks.1.attentions.1.to_k.bias", "module.up_blocks.1.attentions.1.to_v.weight", "module.up_blocks.1.attentions.1.to_v.bias", "module.up_blocks.1.attentions.1.to_out.0.weight", "module.up_blocks.1.attentions.1.to_out.0.bias", "module.up_blocks.1.attentions.2.group_norm.weight", "module.up_blocks.1.attentions.2.group_norm.bias", "module.up_blocks.1.attentions.2.to_q.weight", "module.up_blocks.1.attentions.2.to_q.bias", "module.up_blocks.1.attentions.2.to_k.weight", "module.up_blocks.1.attentions.2.to_k.bias", "module.up_blocks.1.attentions.2.to_v.weight", "module.up_blocks.1.attentions.2.to_v.bias", "module.up_blocks.1.attentions.2.to_out.0.weight", "module.up_blocks.1.attentions.2.to_out.0.bias", "module.up_blocks.1.resnets.0.norm1.weight", "module.up_blocks.1.resnets.0.norm1.bias", "module.up_blocks.1.resnets.0.conv1.weight", "module.up_blocks.1.resnets.0.conv1.bias", "module.up_blocks.1.resnets.0.time_emb_proj.weight", "module.up_blocks.1.resnets.0.time_emb_proj.bias", "module.up_blocks.1.resnets.0.norm2.weight", "module.up_blocks.1.resnets.0.norm2.bias", "module.up_blocks.1.resnets.0.conv2.weight", "module.up_blocks.1.resnets.0.conv2.bias", "module.up_blocks.1.resnets.0.conv_shortcut.weight", "module.up_blocks.1.resnets.0.conv_shortcut.bias", "module.up_blocks.1.resnets.1.norm1.weight", "module.up_blocks.1.resnets.1.norm1.bias", "module.up_blocks.1.resnets.1.conv1.weight", "module.up_blocks.1.resnets.1.conv1.bias", "module.up_blocks.1.resnets.1.time_emb_proj.weight", "module.up_blocks.1.resnets.1.time_emb_proj.bias", "module.up_blocks.1.resnets.1.norm2.weight", "module.up_blocks.1.resnets.1.norm2.bias", "module.up_blocks.1.resnets.1.conv2.weight", "module.up_blocks.1.resnets.1.conv2.bias", "module.up_blocks.1.resnets.1.conv_shortcut.weight", "module.up_blocks.1.resnets.1.conv_shortcut.bias", "module.up_blocks.1.resnets.2.norm1.weight", "module.up_blocks.1.resnets.2.norm1.bias", "module.up_blocks.1.resnets.2.conv1.weight", "module.up_blocks.1.resnets.2.conv1.bias", "module.up_blocks.1.resnets.2.time_emb_proj.weight", "module.up_blocks.1.resnets.2.time_emb_proj.bias", "module.up_blocks.1.resnets.2.norm2.weight", "module.up_blocks.1.resnets.2.norm2.bias", "module.up_blocks.1.resnets.2.conv2.weight", "module.up_blocks.1.resnets.2.conv2.bias", "module.up_blocks.1.resnets.2.conv_shortcut.weight", "module.up_blocks.1.resnets.2.conv_shortcut.bias", "module.up_blocks.1.upsamplers.0.conv.weight", "module.up_blocks.1.upsamplers.0.conv.bias", "module.up_blocks.2.resnets.0.norm1.weight", "module.up_blocks.2.resnets.0.norm1.bias", "module.up_blocks.2.resnets.0.conv1.weight", "module.up_blocks.2.resnets.0.conv1.bias", "module.up_blocks.2.resnets.0.time_emb_proj.weight", "module.up_blocks.2.resnets.0.time_emb_proj.bias", "module.up_blocks.2.resnets."module.conv_norm_out.weight", "module.conv_norm_out.bias", "module.conv_out.weight", "module.conv_out.bias". Steps: 0%| | 0/1000 [00:04<?, ?it/s] [2024-07-18 11:22:03,033] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 130 closing signal SIGTERM [2024-07-18 11:22:03,097] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 129) of binary: /opt/conda/bin/python3.10 Traceback (most recent call last): File "/opt/conda/bin/accelerate", line 8, in sys.exit(main()) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1088, in launch_command multi_gpu_launcher(args) File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 733, in multi_gpu_launcher distrib_run.run(args) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run elastic_launch( File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train_cm_ct_unconditional.py FAILED

Failures:

Root Cause (first observed failure): [0]: time : 2024-07-18_11:22:03 host : 16d778a537ad rank : 0 (local_rank: 0) exitcode : 1 (pid: 129) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html ]
tolgacangoz commented 1 month ago

It seems that my PR solved only one part of the problem :/ Unfortunately, I have no experience with distributed training. @dg845 was the coder of this file.

KetanMann commented 1 month ago

Ok Thanks