Model Interpolation of Models of Different Size (#layers, hidden_size, intermediate_size, attention_head)

mlfoundations / wise-ft

Robust fine-tuning of zero-shot models

https://arxiv.org/abs/2109.01903

Other

654 stars 67 forks source link

Model Interpolation of Models of Different Size (#layers, hidden_size, intermediate_size, attention_head) #7

Closed sanyalsunny111 closed 2 years ago

sanyalsunny111 commented 2 years ago

I am trying to add two model checkpoints [ViT-B16 transformer] of different no. of layers, hidden size, attention heads and intermediate size using the same model interpolation way shown in the paper. Could you provide/suggest a minimal sample code or pseudo code for this as the way shown in the paper just works for the same models?

mitchellnw commented 2 years ago

Unfortunately in the paper we only interpolate models which have the same architecture. We currently don't know how to interpolate models with different architectures. If you end up trying out some ideas we would be very curious to hear the result! If the models are both vit b/16s, why do they have different architectures?

sanyalsunny111 commented 2 years ago

Thank you for replying, Sure I am also curious to see how model interpolation works with different model config, will share my results if I succeed. I meant my zeroshot model is ViT B-16 and I have two smaller models with different config namely ViTB-16-tiny and ViTB-16-xtra-tiny by changing the #layers, hidden size and intermediate sizes.

mitchellnw commented 2 years ago

Best of luck! Hope you make this work as that would be very exciting.