Question about specifying the style

LaLaLailalai commented 2 years ago

Hi Rinon,

About the style of images, I have two questions:

Can I specify both the source style (--source_class) and target style (--target_class) according to several images? I know I can specify the target style according to the images for sure in the code as shown below, but how about the source class?

Can you give us the list of all the available styles that can be used in the code? I'm a bit confused about how to set some styles correctly in the code, like the style ukiyoe, should I set it to "ukiyoe" or "ukiyo-e"? Because I'd like to reproduce the results in the paper.
Finally, how can I generate the sampling results as shown in the paper by myself? I ran the code and the saved results are four photos every time. So how can I generate diverse results as the paper shows?

dst_001750

dst_001600

rinongal commented 2 years ago

Hi!

Can I specify both the source style (--source_class) and target style (--target_class) according to several images? I know I can specify the target style according to the images for sure in the code as shown below, but how about the source class?

This is not inherently supported, but theoretically you can look at the img2img direction function here, and simply replace the source directions with an image list which you would then extract a CLIP embedding for just like I do for the target image list. If that's not clear, let me know and I'll try to elaborate.

Can you give us the list of all the available styles that can be used in the code? I'm a bit confused about how to set some styles correctly in the code, like the style ukiyoe, should I set it to "ukiyoe" or "ukiyo-e"? Because I'd like to reproduce the results in the paper.

We tried to make sure all the texts in the paper are the exact texts we used during training. The results you're seeing look to me like a mismatch in parameter choices, rather than sensitivity to the target text at the level you're describing. Our supplementary has the parameters for many of the experiments which you can try to look at, and I've replied with parameters for other prompts when people requested them here. If there's something you can't reproduce and can't find the parameters for, let me know and I'll look them up. For the sketches in particular you can just look at our colab, it's the default prompt and setting.

Finally, how can I generate the sampling results as shown in the paper by myself? I ran the code and the saved results are four photos every time. So how can I generate diverse results as the paper shows?

Are you looking to simply generate more images from the same model? You can either have a look at our colab notebook which has a cell doing just that. Alternatively, have a look at Rosinality's repo. You should be able to just load our checkpoints and generate more images with their script.

LaLaLailalai commented 2 years ago

Hi Rinon,

Thanks a lot for your detailed reply!

We tried to make sure all the texts in the paper are the exact texts we used during training. The results you're seeing look to me like a mismatch in parameter choices, rather than sensitivity to the target text at the level you're describing.

For the style transfer from photo to ukiyoe, I can't find the parameters in the Github issue as well as the supplementary of the paper, I used such parameter setting (I mainly follow the example that you give in Readme):

python train.py --size 1024 --batch 2 --n_sample 4 --output_dir xxx --lr 0.002 --frozen_gen_ckpt ./pretrained_model/stylegan2-ffhq-config-f.pt --iter 2000 --source_class photo --target_class ukiyoe --auto_layer_k 18 --auto_layer_iters 1 --auto_layer_batch 8 --output_interval 50 --clip_models "ViT-B/32" "ViT-B/16" --clip_model_weights 1.0 1.0 --mixing 0.0 --save_interval 150 Btw, I set iter to 2000, because I found that the training is really fast, which only needs a few hours, so I set it larger than the example.

Besides, just to double-check, if the training time should be very fast like 2 hours for about 200 iterations? I was shocked by such an efficient network! 👍

rinongal commented 2 years ago

Training time should be short, yes. 2 hours for 200 iterations seems too long. The time depends a lot on your GPU, but I would expect no more than 10 minutes for 200 iterations (if you're on a K80 or something of that sort) and closer to 5 minutes on a modern GPU.

Increasing the number of iterations to 2000 is not recommended if you're only changing the style. If you train for too long, you'll get mode collapse and bad results. The best way to determine a good number of iterations is to just look at the outputs throughout the training and see where you think the results were the best. This is typically around 300 iterations.

I'll look for the Ukiyo-e parameters and let you know, but I think it was close to the other painting styles - so in your command you'd use --clip_model_weights 1.0 0.0 and --iter 300.

rinongal commented 1 year ago

Closing due to lack of activity

rinongal / StyleGAN-nada

Question about specifying the style #51