sail-sg / metaformer

MetaFormer Baselines for Vision (TPAMI 2024)
https://arxiv.org/abs/2210.13452
Apache License 2.0
416 stars 27 forks source link

Caformer finetuning. Adapter/LoRa? #7

Open kacwin opened 1 year ago

kacwin commented 1 year ago

Hey, great job with this repo. Caformer with 100M parameters is really powerful, though I am struggling with the finetuning due to hardware limitations. Did you already make experiments with something like adapter finetuning or LoRA? At first glance, the code looks like one would need to rewrite a lot for this.

yuweihao commented 1 year ago

Hi @kacwin ,

Many thanks for your attention. I did not conduct experiments with adapter finetuning or LoRA. For hardware limitations, since all models in the paper do not use Batch Norm, we can set --grad-accum-steps.

kacwin commented 1 year ago

Thanks for the info, using gradient accumulation would be kind of a last resort; there simply is too much data. I will try some LoRA experiments in the near future, can give an update on that if you want.

yuweihao commented 1 year ago

Many thanks!

kacwin commented 1 year ago

Hello, we did some experiments with LoRa finetuning.

So with a fraction of trainable parameters, we achieved similar results. However, training time/GPU space sadly did not decrease that much (maybe factor 0.66). I think there is some potential here :)

yuweihao commented 1 year ago

Thanks for sharing valuable experiment results.