hila-chefer / TargetCLIP

[ECCV 2022] Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.
232 stars 27 forks source link

how to create my own dir? #1

Closed loboere closed 2 years ago

hila-chefer commented 2 years ago

Hi @loboere, thanks for your interest in our work! You need to invert your image using the e4e encoder (https://github.com/omertov/encoder4editing) and get the latent corresponding to your image. Then, upload the latent vector to our colab and insert its path. In order to make the process a bit easier, Iโ€™ll simply add that to the colab notebook so all you need to do is add your image. Will update here once done :)

loboere commented 2 years ago

I'm not talking about editing an image.

I mean to create my own targets?

hila-chefer commented 2 years ago

@loboere ohh I see sorry for the misunderstanding. Thereโ€™s a find_dirs script under the optimization folder, you can simply use that with the default hyperparams there :) I recommend using inverted images for the training to get better results. Optimal results are achieved when the direction is initialized to be the targetโ€™s inversion. Iโ€™ll add instructions to the readme about the training process soon.

hila-chefer commented 2 years ago

@loboere training instructions added, please feel free to ask any questions if anything is unclear :)

alexpeattie commented 2 years ago

Thanks for this amazing work @hila-chefer ๐ŸŽ‰ ! In case it's helpful to others, a quick guide to training new directions (in a quick and dirty way) on the Colab:

!pip install ftfy regex tqdm
!pip install git+https://github.com/openai/CLIP.git

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# downloads StyleGAN's weights
ids = ['1EM87UquaoQmk17Q8d5kYIAHqu0dkYqdT']
for file_id in ids:
  downloaded = drive.CreateFile({'id':file_id})
  downloaded.FetchMetadata(fetch_all=True)
  downloaded.GetContentFile(downloaded.metadata['title'])

to:

!pip install ftfy regex tqdm
!pip install git+https://github.com/openai/CLIP.git

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# downloads StyleGAN's weights and inverted latents
ids = ['1EM87UquaoQmk17Q8d5kYIAHqu0dkYqdT', '1j7RIfmrCoisxx3t-r-KC02Qc8barBecr']
for file_id in ids:
  downloaded = drive.CreateFile({'id':file_id})
  downloaded.FetchMetadata(fetch_all=True)
  downloaded.GetContentFile(downloaded.metadata['title'])

Upload your target image to dirs/targets, then add and run a new cell with the below (replacing your_target.jpg as appropriate):

! PYTHONPATH=`pwd` python optimization/find_dirs.py --ckpt stylegan2-ffhq-config-f.pt --target_path dirs/targets/your_target.jpg --dir_name results_folder --weight_decay 3e-3 --lambda_consistency 0.6 --step 1000 --lr 0.2 --num_directions 6 --num_images 3 --data_path test_faces.pt

I found dropping num_images from 8 to 3 allowed me to run it on Colab (with a P100 or V100) without CUDA out of memory errors, and still get good results. It took approximately 15 mins per direction - so --num_directions 6 takes 90 mins in total.

Then download results_folder. Inspect img_gen_amp* to decide on which results look best, then keep the corresponding direction*.npy file. For example, if img_gen_amp_4_* looks best, you'd keep direction4.npy. You can then add the .npy file to the dirs directory and use it for essence transfer.

Below is an example of a Thanos direction I trained:

download

hila-chefer commented 2 years ago

Brilliant, thanks so much for sharing @alexpeattie! Your Thanos looks amazing ๐ŸŽ‰๐ŸŽ‰ With your permission, I'd be happy to share your direction in our repo as one of the pretrained directions :) I'm assuming you have Colab pro right? I never got P100 or V100 on my free account...

loboere commented 2 years ago

can you tell me if reducing num_images from 8 to 3 affects quality ?

alexpeattie commented 2 years ago

Brilliant, thanks so much for sharing @alexpeattie! Your Thanos looks amazing ๐ŸŽ‰๐ŸŽ‰ With your permission, I'd be happy to share your direction in our repo as one of the pretrained directions :) I'm assuming you have Colab pro right? I never got P100 or V100 on my free account...

Yes, very happy for you to add to the repo ๐Ÿ˜ƒ . I do have Colab Pro yes - but actually I think all Colab GPUs have 16GB+ VRAM, so I think 3 images should generally be fine? (Just w/ potential slower training)

can you tell me if reducing num_images from 8 to 3 affects quality ?

I think @hila-chefer mentioned some examples in the paper were trained with 4 images, so I think you should still get comparable-ish results with 3 images - but of course I'd defer to her expertise โ˜บ๏ธ

hila-chefer commented 2 years ago

Yes, very happy for you to add to the repo ๐Ÿ˜ƒ . I do have Colab Pro yes - but actually, I think all Colab GPUs have 16GB+ VRAM, so I think 3 images should generally be fine? (Just w/ potential slower training)

Thank you, @alexpeattie, will add your direction! ๐Ÿ‘ I'm actually getting a GPU with only 11.2GB VRAM, so not enough for even 2 training images, unfortunately...

can you tell me if reducing num_images from 8 to 3 affects quality ?

@loboere The number of training images is important specifically for our consistency loss. The training images are meant to assure us that our direction isn't dependant on the id of the person. A smaller number of images could result in some additional artifacts to the essence vector (i.e. maybe a bit more identity loss than usual), but the overall results should still look good ๐Ÿ˜ƒ The target in this case (Thanos) is very challenging, so it's really nice to see results that make sense ๐Ÿ˜ƒ

loboere commented 2 years ago

mmm, and what if we use a batch of 3 images and then continue training with a batch of 3 other images, is it possible to do that?

hila-chefer commented 2 years ago

@loboere should work with batching, we havenโ€™t tried that. I assume in this case the hyperparameters may require further tuning (to avoid constant jumping between optimal vectors for each batch). Iโ€™ll add this to my future extensions :-)

hila-chefer commented 2 years ago

@alexpeattie @loboere Just added some new directions- including Thanos ๐Ÿ˜ƒ I retrained the Thanos direction with 4 images instead of 3 to get better consistency. Thanks, @alexpeattie for the idea and for your comment on training with colab!

alexpeattie commented 2 years ago

No worries @hila-chefer, thanks again for this great project ๐Ÿ˜ƒ!