model confuse - Githubissues

snow1929 commented 2 years ago

Hi, I have a littile confuse about the model, Would you please help me ? Here is the Question: 1.In section of model selection the model is stylegan3 or CLIP?

if i want to transfer the style of image by StyleGAN3_CLIP I have to pretrain the CLIP ,right?

Thank you so much

ouhenio commented 2 years ago

Hi @snow1929!

Sorry for the delayed response.

About 1): the models that you select are pre-trained stylegan models.

Regarding 2): I'm not sure I understood what you meant. Are you referring to the projection/inversion notebook?

CLIP isn't trained during this procedures. The role of CLIP is to assess if the generated image is related to the input text, and with this information, guide the generation process of the stylegan to get a better relationship score between the text and said image.

For example, if we want to generate an image of an apple, the stylegan will initially generate image X, CLIP receive as inputs both image and "apple", returning a score between 0 and 1, were 1 is unrelated, and 0 is they are the same concept. Since we now have a number that represents how relevant is the generated image to the text, we can guide the generation process doing optimization tricks to achieve a better score, and thus, generate an image related to the prompt.

This means that each generation process is in itself an optimization, which I guess you could say it is a training process, but both models (CLIP and StyleGAN) are not trained during this process, since we find the optimum inside the latent space of our StyleGAN.

snow1929 commented 2 years ago

Hi @ouhenio , In this sample code, which part is going to import the pretrained model of CLIP. Do you use the pretrained model from OpenAI? Or you train a new one for your datasets?

Thank you so much

ouhenio commented 2 years ago

Hey @snow1929!

Ok, so the sections that deal with CLIP are the following:

# From the "Install libraries 🏗️" section
!git clone https://github.com/openai/CLIP

sys.path.append('./CLIP')

import clip

# From the "Define necessary functions 🛠️" section

def embed_image(image):
  n = image.shape[0]
  cutouts = make_cutouts(image)
  embeds = clip_model.embed_cutout(cutouts)
  embeds = rearrange(embeds, '(cc n) c -> cc n c', n=n)
  return embeds

class CLIP(object):
  def __init__(self):
    clip_model = "ViT-B/32"
    self.model, _ = clip.load(clip_model)
    self.model = self.model.requires_grad_(False)
    self.normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],
                                          std=[0.26862954, 0.26130258, 0.27577711])

  @torch.no_grad()
  def embed_text(self, prompt):
      "Normalized clip text embedding."
      return norm1(self.model.encode_text(clip.tokenize(prompt).to(device)).float())

  def embed_cutout(self, image):
      "Normalized clip image embedding."
      return norm1(self.model.encode_image(self.normalize(image)))

clip_model = CLIP()

During the Run the model 🚀 section, CLIP is used (via the embed_image() function), on each iteration to get the generated image CLIP score.

I'm indeed using OpenAI version with the weights they made public.

Feel free to ask me for clarification on something, or anything else about the code!

Good luck!

snow1929 commented 2 years ago

Hi, @ouhenio
1.The StyleCLIP provides three methods. A.Latent Optimization B.Latent Mapper C.Global Directions which methods is in the model of StyleGAN3+CLIP 2.If I train the model of CLIP by myself. How to import my CLIP model to the StyleGAN3+CLIP.

ouhenio / StyleGAN3-CLIP-notebooks

model confuse #19