mbzuai-oryx / LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
813 stars 61 forks source link

S2 finetuning #9

Closed xmu-xiaoma666 closed 6 months ago

xmu-xiaoma666 commented 6 months ago

When you use S2 Finetining, the channel dimension of visual features will increase by three times. How to deal with the increase in the number of channels passed through?

mmaaz60 commented 6 months ago

Hi @xmu-xiaoma666,

Thank you for your interest in our work. Yes you are right, while using S2, the channel dimensions will increase 3x and we have to accordingly adjust the MLP projector dimensions.

In summary now the MLP will be projecting from 1024*3 to 4096 instead of from 1024 to 4096. Further, note that we have to perform pretraining again as the projector changes in this case. I hope it will be helpful. Thank You.