Closed aretius closed 3 months ago
Thanks!
Played a pretty big role. We eventually were able to scale to 1024 examples in the batch and that pretty sure helped. We basically started training with smaller batch sizes but we saw some degradation in terms of generalization.
Cost and large gpus availability was a vig factor at the time of training
Got it i understand.
The version in the paper is FashionCLIP 1, for the 2.0 we used a larger machine
Metrics are fine, the thing I'd suggest is to use an external dataset, not the one you are training on
Makes sense, since when i started playing around a bit i needed to add a lot of optimisation for having a batch size as large as 512. Given your experience its worth getting a larger instance with multiple GPUs to get a big batch size.
Currently i am just using standard, train+valid split and was thinking of measuring just loss here. Are you referring to a different dataset entirely? like the public ones(for my case)
Hard to say, I think I'd probably start with the batch size you can get on a standard machine and see the quality of the final model.
I'd use external datasets, even if you are training on domain-specific data you can also probably use MSCOCO just to see how much generalization power you lose
Thanks a lot for the amazing work! I wanted to understand more about finetuning process.
Thanks!