Closed Alen95 closed 3 years ago
Hi @Alen95, From a first glance, it looks like you are trying to load a different checkpoint from the one you produced (The checkpoint's name is the same as the official one we published, perhaps it is not the one you intend to load)?
The error probably occures since the checkpoint contains an encoder for 1024 resolution StyleGAN while in the ops object, opts.stylegan_size=512
(therefore the extra layers 16 and 17), I will look into it and see if it is a bug in loading a saved checkpoint.
I have followed the guide (Training), with the proposed commands, which is available inside the repository to train an encoder on my own dataset. I have done approx 50000 steps and took the last checkpoint to perform the inference and at this point I got the aforementioned error.
Can you elaborate on the inference procedure?
When you train the model for 50k iterations, it should output a checkpoint named best_model.pt
and iteration_50000.pt
under the <exp_dir>/checkpoints
folder.
In case you are using the inference script of this repo, you should specify one of the above checkpoints,
currently it looks like the inference script instead tries to load the official checkpoint of e4e_ffhq_encode.pt
(from the log you specified)
Can you share the inference script/command being used?
Yes, the checkpoints can be found in the mentioned folder.
The training command : python scripts/train.py --dataset_type my_data_encode --exp_dir new/experiment/directory --start_from_latent_avg --use_w_pool --w_discriminator_lambda 0.1 --progressive_start 20000 --id_lambda 0.5 --val_interval 10000 --max_steps 50000 --stylegan_size 512 --stylegan_weights pretrained_models/stylegan2-ffhq-config-f.pt --workers 8 --batch_size 8 --test_batch_size 4 --test_workers 4
The inference command : python scripts/inference.py --images_dir=notebooks/images --save_dir=output best_model.pt
I always move the checkpoint to the root directory, hence why I use "best_model.pt"
The last error message:
Loading e4e over the pSp framework from checkpoint: best_model.pt
Traceback (most recent call last):
File "scripts/inference.py", line 134, in
Hi @Alen95,
It looks like a progress was made, as now the log starts with "Loading e4e ... from checkpoint: best_model.pt"
One thing I notice is that you train the encoder using --stylegan_size 512 --stylegan_weights pretrained_models/stylegan2-ffhq-config-f.pt
, which i am not sure is correct, as the pretrained ffhq stylegan is of size 1024.
As for the error, it seems like the e4e encoder is initialized into the 256 resolution stylegan as opposed to the checkpoint which contains an e4e encoder trained for a 512 resolution stylegan.
In order to solve this, you need to make sure that opts.stylegan_size=512
in the inference script (you can add a debug print to test it out).
Can you send the contents of the opts.json file in the experiment directory? Also as a side note, In case there is already an latents.pt file in the save_dir, I recomend deleting it for a clean inference.
Best, Omer
I had the same problem. It seems like model_utils.py infers stylegan_size from the dataset_type string which lead to a wrong value in my case.
Thank you @AleksiKnuutila, the override of the stylegan_size
can lead to such confusing errors, I will remove it.
After training the encoder on my own dataset and trying to use it for inference, I get the following error :
Loading e4e over the pSp framework from checkpoint: e4e_ffhq_encode.pt Traceback (most recent call last): File "scripts/train.py", line 88, in
main()
File "scripts/train.py", line 28, in main
coach = Coach(opts, previous_train_ckpt)
File "./training/coach.py", line 39, in init
self.net = pSp(self.opts).to(self.device)
File "./models/psp.py", line 28, in init
self.load_weights()
File "./models/psp.py", line 43, in load_weights
self.encoder.load_state_dict(get_keys(ckpt, 'encoder'), strict=True)
File "/opt/conda/envs/e4e_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Encoder4Editing:
Unexpected key(s) in state_dict: "styles.16.convs.0.weight", "styles.16.convs.0.bias", "styles.16.convs.2.weight", "styles.16.convs.2.bias", "styles.16.convs.4.weight", "styles.16.convs.4.bias", "styles.16.convs.6.weight", "styles.16.convs.6.bias", "styles.16.convs.8.weight", "styles.16.convs.8.bias", "styles.16.convs.10.weight", "styles.16.convs.10.bias", "styles.16.linear.weight", "styles.16.linear.bias", "styles.17.convs.0.weight", "styles.17.convs.0.bias", "styles.17.convs.2.weight", "styles.17.convs.2.bias", "styles.17.convs.4.weight", "styles.17.convs.4.bias", "styles.17.convs.6.weight", "styles.17.convs.6.bias", "styles.17.convs.8.weight", "styles.17.convs.8.bias", "styles.17.convs.10.weight", "styles.17.convs.10.bias", "styles.17.linear.weight", "styles.17.linear.bias".
Did anyone face the same problem or does anyone have any hints which may help to solve the problem ?