yumingj / Talk-to-Edit

Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.
https://www.mmlab-ntu.com/project/talkedit/
338 stars 47 forks source link

Why cannot get expected results? #7

Closed zhanghm1995 closed 2 years ago

zhanghm1995 commented 2 years ago

Hi, Thanks for your wonderful work. However, when I try to run the demo by using your pretrained models and default config parameters, that is:

python editing_wo_dialog.py \
   --opt ./configs/editing/editing_wo_dialog.yml \
   --attr 'Bangs' \
   --target_val 5

I always get the following results:

This attribute is already at the degree that you want. Let's try a different attribute degree or another attribute.

or

Sorry, we are unable to edit this attribute. Perhaps we can try something else.

And I can only find the cropped face image and a simple start_image.png in my results folder.

And I also have tried some other attr and target_val combinations and got the above output as well.

I don't know what the problems they are. And I also not sure about the exact meaning about the target_val.

BTW, in your README, you mentioned we can use Beard attribute, but I found it have only No_Beard attribute in your config files.

attr_to_idx:
  Bangs: 0
  Eyeglasses: 1
  No_Beard: 2
  Smiling: 3
  Young: 4

Hope you could offer my help, thanks in advance.

yumingj commented 2 years ago

Hi, thanks for your interest.

First, you can try to edit our provided example images to see if there is anything wrong with your configurations and setup.

For the messages you mentioned:

This attribute is already at the degree that you want. Let's try a different attribute degree or another attribute.

This means that the image you tried is already at the degree that you want. You can change the target-val to another one.

Sorry, we are unable to edit this attribute. Perhaps we can try something else.

This message means that this image is not editable. The reason might be addressed as follows:

1) For some images, the cropping algorithm cannot detect faces correctly for 1024 x 1024 resolution. You may try the image resolution with 128 x 128 for real images editing.

2) Our work mainly focuses on synthesized images editing. For synthesized images, we can directly start from a latent code. So our results are more stable on synthesized images. You may try to edit on synthesized images. For real image editing, some real images are not editable in our methods. Editing on real images involves the GAN-inversion process, i.e., finding the corresponding latent code in pre-trained stylegan latent space for the given real image. The GAN inversion inevitably introduces error and adds to the difficulty of the problem. There exists a gap between the inverted latent code and the original pre-trained StyleGAN. This leads to the issue that we cannot edit some images. You may try to combine some advanced GAN-inversion methods (e.g., in domain GAN inversion and pixel2style2pixel) with our editing method.

The target_val means the degrees of the attributes. Take Smiling as an example. '0' denotes a face without a smile, and '5' denotes an exaggerated laughing with mouth widely open. You can find the detailed definition of attributes in our supplementary files.

Beard attribute is the No_Beard attribute in our configs.

zhanghm1995 commented 2 years ago

Hi, @yumingj , Thanks for your patient and detailed response.

I ran the code by using your provided default configs (editing_wo_dialog.yml, img_res: 1024, is_real_image: False), however, I got the following results and a pure color start_imag.png, I'm not sure what the problem:

2021-11-19 15:43:02,338.338 - INFO:   name: editing_wo_dialog
  img_res: 1024
  latent_code_path: ./download/editing_data/teaser_latent_code.npz.npy
  latent_code_index: 38
  inversion:[
    is_real_image: False
    img_path: ./download/real_images/annehathaway.png
    crop_img: True
    device: cuda
    img_mse_weight: 1.0
    step: 600
    noise: 0.05
    noise_ramp: 0.75
    lr: 0.1
    lr_gen: 0.0001
  ]
  use_tb_logger: True
  set_CUDA_VISIBLE_DEVICES: None
  gpu_ids: [3]
  attribute: Eyeglasses
  model_type: FieldFunctionModel
  fix_layers: True
  replaced_layers_128: 8
  replaced_layers_1024: 10
  manual_seed: 2021
  confidence_thresh: 0
  max_cls_num: 5
  min_cls_num: 0
  max_trials_num: 100
  print_every: False
  transform_z_to_w: False
  num_layer: 8
  hidden_dim: 512
  leaky_relu_neg_slope: 0.2
  attr_file: ./configs/attributes_5.json
  baseline: classification
  use_sigmoid: True
  gt_remapping_file: None
  predictor_ckpt_128: ./download/pretrained_models/predictor_128.pth.tar
  predictor_ckpt_1024: ./download/pretrained_models/predictor_1024.pth.tar
  latent_dim: 512
  n_mlp: 8
  channel_multiplier_128: 1
  channel_multiplier_1024: 2
  generator_ckpt_128: ./download/pretrained_models/stylegan2_128.pt
  generator_ckpt_1024: ./download/pretrained_models/stylegan2_1024.pth
  latent_space: w
  has_dialog: False
  device_name: gpu
  pretrained_field_128:[
    Bangs: ./download/pretrained_models/128_field/Bangs.pth
    Eyeglasses: ./download/pretrained_models/128_field/Eyeglasses.pth
    No_Beard: ./download/pretrained_models/128_field/No_Beard.pth
    Smiling: ./download/pretrained_models/128_field/Smiling.pth
    Young: ./download/pretrained_models/128_field/Young.pth
  ]
  pretrained_field_1024:[
    Bangs: ./download/pretrained_models/1024_field/Bangs.pth
    Eyeglasses: ./download/pretrained_models/1024_field/Eyeglasses.pth
    No_Beard: ./download/pretrained_models/1024_field/No_Beard.pth
    Smiling: ./download/pretrained_models/1024_field/Smiling.pth
    Young: ./download/pretrained_models/1024_field/Young.pth
  ]
  attr_to_idx:[
    Bangs: 0
    Eyeglasses: 1
    No_Beard: 2
    Smiling: 3
    Young: 4
  ]
  is_train: False
  attr_list: ['Bangs', 'Eyeglasses', 'No_Beard', 'Smiling', 'Young']
  attr_dict:[
    Bangs: 0
    Eyeglasses: 1
    No_Beard: 2
    Smiling: 3
    Young: 4
  ]
  channel_multiplier: 2
  pretrained_field:[
    Bangs: ./download/pretrained_models/1024_field/Bangs.pth
    Eyeglasses: ./download/pretrained_models/1024_field/Eyeglasses.pth
    No_Beard: ./download/pretrained_models/1024_field/No_Beard.pth
    Smiling: ./download/pretrained_models/1024_field/Smiling.pth
    Young: ./download/pretrained_models/1024_field/Young.pth
  ]
  predictor_ckpt: ./download/pretrained_models/predictor_1024.pth.tar
  generator_ckpt: ./download/pretrained_models/stylegan2_1024.pth
  replaced_layers: 10

2021-11-19 15:43:12,266.266 - INFO: Sorry, we are unable to edit this attribute. Perhaps we can try something else.

BTW, my running environment is:

torch                   1.9.1
python                3.7.11
RTX3090

And when I ran the code the first time, I got some nvcc error, and I googled it and set a environment variable TORCH_CUDA_ARCH_LIST=7.5 to make all things goes well. I'm not sure whether my environment settings have the relationships with the unexpected results.

yumingj commented 2 years ago

It seems that your environment is wrong. If you use the provided config, the program will synthesize an image same as the teaser image in our paper rather than a pure color image.

zhanghm1995 commented 2 years ago

Thanks for your kindly and quickly response. It indeed because of the running environment. When I changed the new running environment I ran things all ok.

Thank you again.