patrickjohncyh / fashion-clip

FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
MIT License
293 stars 34 forks source link

How was the fine-tunimg done exactly #32

Closed travellingsasa closed 2 months ago

travellingsasa commented 2 months ago

Hey there,

I am wondering how you did the fine-tuning here. You do not describe it in the paper.

Did you

  1. Continue training starting from a pre trained model like open clip
  2. Added a classification head and froze all other layer
  3. Added a classification head and updated image and text encoders

I don't think you did 2 or 3 since you used full sentences as captions.

How did you do it?

All the best

vinid commented 2 months ago

Hi!!

It's contrastive fine-tuning, we use the same task CLIP was trained on. All unfrozen.

Let me know if you need more details!

travellingsasa commented 2 months ago

So when you say "same task CLIP was trained on" do I correctly assume you continued training without adding a classifier?

vinid commented 2 months ago

Yup, we keep the same contrastive pre-training objective

travellingsasa commented 2 months ago

Thank you for the clarification and the super quick reply :)

vinid commented 2 months ago

Happy to help!!

On Tue, Apr 30, 2024, 14:14 travellingsash @.***> wrote:

Thank you for the clarification and the super quick reply :)

— Reply to this email directly, view it on GitHub https://github.com/patrickjohncyh/fashion-clip/issues/32#issuecomment-2087342972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARBSS6EQCFIX7EIF344DW3ZAACRRAVCNFSM6AAAAABG7C3GB2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBXGM2DEOJXGI . You are receiving this because you commented.Message ID: @.***>