Open snow1929 opened 2 years ago
Hi @snow1929!
Sorry for the delayed response.
About 1): the models that you select are pre-trained stylegan models.
Regarding 2): I'm not sure I understood what you meant. Are you referring to the projection/inversion notebook?
CLIP isn't trained during this procedures. The role of CLIP is to assess if the generated image is related to the input text, and with this information, guide the generation process of the stylegan to get a better relationship score between the text and said image.
For example, if we want to generate an image of an apple, the stylegan will initially generate image X, CLIP receive as inputs both image and "apple", returning a score between 0 and 1, were 1 is unrelated, and 0 is they are the same concept. Since we now have a number that represents how relevant is the generated image to the text, we can guide the generation process doing optimization tricks to achieve a better score, and thus, generate an image related to the prompt.
This means that each generation process is in itself an optimization, which I guess you could say it is a training process, but both models (CLIP and StyleGAN) are not trained during this process, since we find the optimum inside the latent space of our StyleGAN.
Hi @ouhenio , In this sample code, which part is going to import the pretrained model of CLIP. Do you use the pretrained model from OpenAI? Or you train a new one for your datasets?
Thank you so much
Hey @snow1929!
Ok, so the sections that deal with CLIP are the following:
# From the "Install libraries 🏗️" section
!git clone https://github.com/openai/CLIP
sys.path.append('./CLIP')
import clip
# From the "Define necessary functions 🛠️" section
def embed_image(image):
n = image.shape[0]
cutouts = make_cutouts(image)
embeds = clip_model.embed_cutout(cutouts)
embeds = rearrange(embeds, '(cc n) c -> cc n c', n=n)
return embeds
class CLIP(object):
def __init__(self):
clip_model = "ViT-B/32"
self.model, _ = clip.load(clip_model)
self.model = self.model.requires_grad_(False)
self.normalize = transforms.Normalize(mean=[0.48145466, 0.4578275, 0.40821073],
std=[0.26862954, 0.26130258, 0.27577711])
@torch.no_grad()
def embed_text(self, prompt):
"Normalized clip text embedding."
return norm1(self.model.encode_text(clip.tokenize(prompt).to(device)).float())
def embed_cutout(self, image):
"Normalized clip image embedding."
return norm1(self.model.encode_image(self.normalize(image)))
clip_model = CLIP()
During the Run the model 🚀
section, CLIP is used (via the embed_image()
function), on each iteration to get the generated image CLIP score.
I'm indeed using OpenAI version with the weights they made public.
Feel free to ask me for clarification on something, or anything else about the code!
Good luck!
Hi, @ouhenio
1.The StyleCLIP provides three methods.
A.Latent Optimization
B.Latent Mapper
C.Global Directions
which methods is in the model of StyleGAN3+CLIP
2.If I train the model of CLIP by myself.
How to import my CLIP model to the StyleGAN3+CLIP.
Hi, I have a littile confuse about the model, Would you please help me ? Here is the Question: 1.In section of model selection the model is stylegan3 or CLIP?
Thank you so much