Closed kaleidoscopical closed 8 months ago
I think CLIP embedding is probably just to maintain the CFG...
The dimensions of sd and clip-vit-base-patch32 are aligned.
Thanks! Really helpful advices!
...
@guoqincode hi, I got normal result when removing the clip feature, but weird result when including the clip feature. any reason?
Nice work! I wonder whether including CLIP image encoder can really expedite the entire training process as claimed in the paper. Do you have any insight about it? Many thanks.
btw, I see
clip-vit-base-patch32
is used instead ofclip-vit-large-patch14
in the config file. Is there any consideration behind it?