Mismatch error when loading CLIP-ViT-L/14 to train with SD 1.5

Exuan148 commented 4 months ago

Hello, I know you used CLIP-ViT-H/14 to encode image prompt when training IP-adapter with SD 1.5. I would like to ask how can I use CLIP-ViT-L/14? Since when I loaded CLIP-ViT-L/14 with CLIPVisionModelWithProjection, I encounterd a size mismatch error. It looks like CLIPVisionModelWithProjection can only load CLIP-ViT-H/14? Thank you.

Foolbee commented 1 month ago

Hi, I am doing the same trials as you. I just modify the params of the image_proj_model and it did work. But there is one thing I am really concerned that when training with a new image encoder we also need billions of data which is really costy

Exuan148 commented 1 month ago

Yes, you are right, image_proj_model is only for CLIP contrastive learning, it is unuseful for ip-adapter. What is your task? I think adapter is easy to train when focus on a single field, we have no need to use such amount of data.

------------------ 原始邮件 ------------------ 发件人: "tencent-ailab/IP-Adapter" @.>; 发送时间: 2024年5月25日(星期六) 中午1:06 @.>; @.**@.>; 主题: Re: [tencent-ailab/IP-Adapter] Mismatch error when loading CLIP-ViT-L/14 to train with SD 1.5 (Issue #306)

Hi, I am doing the same trials as you. I just modify the params of the image_proj_model and it did work. But there is one thing I am really concerned that when training with a new image encoder we also need billions of data which is really costy

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

tencent-ailab / IP-Adapter

Mismatch error when loading CLIP-ViT-L/14 to train with SD 1.5 #306