Open tcflying opened 1 year ago
This method do not train model in advance. It trains/finetunes language model CLIP to make your input text embedding similar to the style embedding you created in runtime. So you just use https://github.com/vicgalle/stable-diffusion-aesthetic-gradients/blob/main/scripts/gen_aesthetic_embeddings.py to generate image embedding created by pretrained CLIP image encoder.
thanks a lot of the reply, i got it. exactly, i mean i should prepare how many pictures or batch to create embedding?
According repo owner's example, 3 images should be sufficient.
According repo owner's example, 3 images should be sufficient.
thank you guys, as vicgalle mention: fantasy.pt: created from https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus by filtering only the images with word "fantasy" in the caption. The top 2000 images by score are selected for the embedding. flower_plant.pt: created from https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6.5plus by filtering only the images with word "plant", "flower", "floral", "vegetation" or "garden" in the caption. The top 2000 images by score are selected for the embedding.
he use 2000 pics, Also, how many batch? UM...
Thank you for the great work first, but is there any guide for training our own .pt? eg. how many sample picture should be training one or ? and batch size? Thanks.