microsoft / StyleSwin

[CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation
https://arxiv.org/abs/2112.10762
MIT License
503 stars 49 forks source link

Error using ckpt when resuming #37

Open alexKup88 opened 1 year ago

alexKup88 commented 1 year ago

Thanks for sharing, I am having this error:

Traceback (most recent call last): File "train_styleswin.py", line 409, in generator.load_state_dict(ckpt["g"]) File "/mnt/anaconda3/envs/StyleSwin/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Generator: Unexpected key(s) in state_dict: "layers.4.blocks.0.attn_mask2", "layers.4.blocks.0.norm1.style.weight", "layers.4.blocks.0.norm1.style.bias", "layers.4.blocks.0.qkv.weight", "layers.4.blocks.0.qkv.bias", "layers.4.blocks.0.proj.weight", "layers.4 and so on

this is the command I am running:

python -m torch.distributed.launch --nproc_per_node=2 train_styleswin.py --batch 4 --path /mnt/DATASETS/FFHQ --checkpoint_path /mnt/PROCESSEDdata/StyleSwin/Train --sample_path /mnt/PROCESSEDdata/StyleSwin/Train --size 32 --G_channel_multiplier 2 --bcr --D_lr 0.0002 --D_sn --ttur --eval_gt_path /mnt/DATASETS/FFHQ --lr_decay --lr_decay_start_steps 775000 --iter 1000000 --ckpt /mnt/PROCESSEDdata/StyleSwin/FFHQ_1024.pt --use_checkpoint

I tried with and without the use_checkpoint flag and also the 256 version giving back the same error.

Best

ForeverFancy commented 1 year ago

Hi, --use_checkpoint specifies to use PyTorch's checkpointing technique to save GPU memory, while --ckpt /mnt/PROCESSEDdata/StyleSwin/FFHQ_1024.pt specifies to resume from ckpt. It seems that you are training the mode with size 32, but using a higher resolution ckpt (e.g. 512, 1024), which would cause this error. The ckpt could only be used when you training the corresponding resolution model, and when training on other resolution, you should train from scratch without the pre-trained ckpt.