Is CLIP image encoder really helpful?

guoqincode / Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

2.91k stars 233 forks source link

Is CLIP image encoder really helpful? #97

Closed kaleidoscopical closed 8 months ago

kaleidoscopical commented 8 months ago

Nice work! I wonder whether including CLIP image encoder can really expedite the entire training process as claimed in the paper. Do you have any insight about it? Many thanks.

btw, I see clip-vit-base-patch32 is used instead of clip-vit-large-patch14 in the config file. Is there any consideration behind it?

guoqincode commented 8 months ago

I think CLIP embedding is probably just to maintain the CFG...

guoqincode commented 8 months ago

The dimensions of sd and clip-vit-base-patch32 are aligned.

kaleidoscopical commented 8 months ago

Thanks! Really helpful advices!

kaleidoscopical commented 8 months ago

...

garychan22 commented 8 months ago

@guoqincode hi, I got normal result when removing the clip feature, but weird result when including the clip feature. any reason?