Closed xmu-xiaoma666 closed 6 months ago
Hi @xmu-xiaoma666,
Thank you for your interest in our work. Yes you are right, while using S2, the channel dimensions will increase 3x and we have to accordingly adjust the MLP projector dimensions.
In summary now the MLP will be projecting from 1024*3 to 4096
instead of from 1024 to 4096
. Further, note that we have to perform pretraining again as the projector changes in this case. I hope it will be helpful. Thank You.
When you use S2 Finetining, the channel dimension of visual features will increase by three times. How to deal with the increase in the number of channels passed through?