a bit of idea on finetuning

patrickjohncyh / fashion-clip

FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.

MIT License

337 stars 37 forks source link

a bit of idea on finetuning #36

Closed aretius closed 3 months ago

aretius commented 3 months ago

Thanks a lot for the amazing work! I wanted to understand more about finetuning process.

Usually in contrastive loss batch size matters a lot. I wanted to know how big of a role did batch size play in your finetuning and what idea worked in your case for selecting batch size?
What was the motivation of using the smaller clip model compared to the larger one, since larger one does provide significant boost(if we look at just image net accuracy)?

Thanks!

vinid commented 3 months ago

Thanks!

Played a pretty big role. We eventually were able to scale to 1024 examples in the batch and that pretty sure helped. We basically started training with smaller batch sizes but we saw some degradation in terms of generalization.
Cost and large gpus availability was a vig factor at the time of training

aretius commented 3 months ago

Got it i understand.

I also went through your paper and observed that you used p3.2xlarge instance - any specific techniques you used to get a Batch Size of 1024 fit in a single GPU?
Also for generalisation was loss a good enough measure or something like precision or recall was used?

vinid commented 3 months ago

The version in the paper is FashionCLIP 1, for the 2.0 we used a larger machine
Metrics are fine, the thing I'd suggest is to use an external dataset, not the one you are training on

aretius commented 3 months ago

Makes sense, since when i started playing around a bit i needed to add a lot of optimisation for having a batch size as large as 512. Given your experience its worth getting a larger instance with multiple GPUs to get a big batch size.

Currently i am just using standard, train+valid split and was thinking of measuring just loss here. Are you referring to a different dataset entirely? like the public ones(for my case)

vinid commented 3 months ago

Hard to say, I think I'd probably start with the batch size you can get on a standard machine and see the quality of the final model.

I'd use external datasets, even if you are training on domain-specific data you can also probably use MSCOCO just to see how much generalization power you lose