omertov / encoder4editing

Official implementation of "Designing an Encoder for StyleGAN Image Manipulation" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02766
MIT License
945 stars 154 forks source link

Training Encoder On Xray Dataset #29

Closed sriderya closed 3 years ago

sriderya commented 3 years ago

Hello,

First of all, thanks for your great work. I am trying to train the encoder on chest X-ray dataset. Although the results seems good, some details are missing, especially for medical sense. As can be seen from the example below, some important details such as cables are not recovered and this case is absolutely undesirable. By the way, results may seem pretty good for you but medical experts totally disagree :)

0002_200000

The parameters are:

{ "batch_size": 8, "board_interval": 50, "checkpoint_path": null, "d_reg_every": 16, "dataset_type": "xray_encode", "delta_norm": 2, "delta_norm_lambda": 0.0002, "encoder_type": "Encoder4Editing", "exp_dir": "/path/to/experiment/dir", "id_lambda": 0.5, "image_interval": 100, "keep_optimizer": false, "l2_lambda": 1.0, "learning_rate": 0.0001, "lpips_lambda": 0.8, "lpips_type": "alex", "max_steps": 200000, "optim_name": "ranger", "progressive_start": 20000, "progressive_step_every": 2000, "progressive_steps": [ 0, 20000, 22000, 24000, 26000, 28000, 30000, 32000, 34000, 36000, 38000, 40000, 42000, 44000 ], "r1": 10, "resume_training_from_ckpt": null, "save_interval": null, "save_training_data": false, "start_from_latent_avg": true, "stylegan_size": 256, "stylegan_weights": "/path/to/stylegan2.pt", "sub_exp_dir": null, "test_batch_size": 4, "test_workers": 4, "train_decoder": false, "update_param_list": null, "use_w_pool": true, "val_interval": 10000, "w_discriminator_lambda": 0.1, "w_discriminator_lr": 2e-05, "w_pool_size": 50, "workers": 8 }

In order to get better inversion for this kind of dataset, which parameters should I tune? How can I improve my results?

Thanks in advance

omertov commented 3 years ago

Hi @sriderya! Indeed, the e4e encoder in its default configurations looks to maximize on perceptual quality and editability as opposed to distortion. While i am not sure what editing capabilities your stylegan has, the encoder does seem to output a perceptually plausible image. In order to obtain a more precise reconstruction you have 2 options:

  1. train an encoder to prioritize distortion, in which case it is possible that the output image will be of less perceptual quality, but its up to you to test it out - for that you can try: a. --encoder_type=GradualStyleEncoder, and to not specify the following flags: --use_w_pool --w_discriminator_lambda --progressive_start b. An hybrid approach of e4e without limiting the deltas and still using the discriminator: --use_w_pool --w_discriminator_lambda 0.1 without the --progressive_start flag.

  2. Alternatively, since you have a pretrained encoder which yields results within the W space, you could use the obtained latent code as an initialization point for an optimization process. A short optimization with low learning rate should yield a near-perfect reconstruction. This is something we experimented with and might add it in a revision, but until then I also might upload the optimization code into the repo for further use.

Best, Omer

omertov commented 3 years ago

Closing this issue for now, feel free to open it again in case of need!

leeisack commented 2 years ago

Hello,

First of all, thanks for your great work. I am trying to train the encoder on chest X-ray dataset. Although the results seems good, some details are missing, especially for medical sense. As can be seen from the example below, some important details such as cables are not recovered and this case is absolutely undesirable. By the way, results may seem pretty good for you but medical experts totally disagree :)

0002_200000

The parameters are:

{ "batch_size": 8, "board_interval": 50, "checkpoint_path": null, "d_reg_every": 16, "dataset_type": "xray_encode", "delta_norm": 2, "delta_norm_lambda": 0.0002, "encoder_type": "Encoder4Editing", "exp_dir": "/path/to/experiment/dir", "id_lambda": 0.5, "image_interval": 100, "keep_optimizer": false, "l2_lambda": 1.0, "learning_rate": 0.0001, "lpips_lambda": 0.8, "lpips_type": "alex", "max_steps": 200000, "optim_name": "ranger", "progressive_start": 20000, "progressive_step_every": 2000, "progressive_steps": [ 0, 20000, 22000, 24000, 26000, 28000, 30000, 32000, 34000, 36000, 38000, 40000, 42000, 44000 ], "r1": 10, "resume_training_from_ckpt": null, "save_interval": null, "save_training_data": false, "start_from_latent_avg": true, "stylegan_size": 256, "stylegan_weights": "/path/to/stylegan2.pt", "sub_exp_dir": null, "test_batch_size": 4, "test_workers": 4, "train_decoder": false, "update_param_list": null, "use_w_pool": true, "val_interval": 10000, "w_discriminator_lambda": 0.1, "w_discriminator_lr": 2e-05, "w_pool_size": 50, "workers": 8 }

In order to get better inversion for this kind of dataset, which parameters should I tune? How can I improve my results?

Thanks in advance

Can you share the learning command entered in the terminal?