williamyang1991 / DualStyleGAN

[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
Other
1.63k stars 253 forks source link

Training fails #94

Closed qppppq closed 10 months ago

qppppq commented 10 months ago

你好,我使用自己的資料集訓練,在第二步驟 Fine-tune StyleGAN. 出了問題,我的資料集的大小是 500*500。 當我執行這行指令時 python -m torch.distributed.launch --nproc_per_node=8 --master_port=8765 finetune_stylegan.py --iter 600 --batch 4 --ckpt ./checkpoint/stylegan2-ffhq-config-f.pt --style picasso --augment ./data/picasso/lmdb/ ,結果是可以運行的,但我在指令中加入 --size 512 時就會出現錯誤,想請問你是否知道這是甚麼問題,下列是我的錯誤訊息。

load model: ./checkpoint/stylegan2-ffhq-config-f.pt Traceback (most recent call last): File "finetune_stylegan.py", line 335, in generator.load_state_dict(ckpt["g"]) File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for Generator: Unexpected key(s) in state_dict: "convs.14.conv.weight", "convs.14.conv.blur.kernel", "convs.14.conv.modulation.weight", "convs.14.conv.modulation.bias", "convs.14.noise.weight", "convs.14.activate.bias", "convs.15.conv.weight", "convs.15.conv.modulation.weight", "convs.15.conv.modulation.bias", "convs.15.noise.weight", "convs.15.activate.bias", "to_rgbs.7.bias", "to_rgbs.7.upsample.kernel", "to_rgbs.7.conv.weight", "to_rgbs.7.conv.modulation.weight", "to_rgbs.7.conv.modulation.bias", "noises.noise_15", "noises.noise_16". ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2658741) of binary: /home/ubuntu/anaconda3/envs/hope/bin/python Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/site-packages/torch/distributed/run.py", line 710, in run elastic_launch( File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/ubuntu/anaconda3/envs/hope/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

finetune_stylegan.py FAILED

williamyang1991 commented 10 months ago

generator.load_state_dict(ckpt["g"])

你的generator在--size 512时是512的模型,但你load的ckpt模型是1024的模型,所以ckpt多了几层网络load不进来。 你输入的ckpt模型也应该是512的模型才能正常运行

qppppq commented 10 months ago

了解,感謝你