Closed jong980812 closed 9 months ago
Hi @jong980812 ,
Thank you for your interest in our work. I apologize for the delay in responding.
The choice of adapter hidden dimension values often requires a comprehensive selection based on the pre-trained model, task, and data scale. The values we chose in our paper were primarily to facilitate a fair comparison with other approaches. If you have insights or have come across research in this area, I would be very grateful if you could share it with me, as I am always eager to learn more.
Regarding the use of the GELU activation function, it was actually a minor oversight in the implementation process. In practice, we found that it almost makes no noticeable difference compared to using ReLU.
Please feel free to reach out if you have further questions or if there's anything else I can assist you with.
Best regards,
Qiankun Gao
I appreciate your good work and thank you for sharing your excellent code. While going through the code, I had a question. In the vit_adapter.yaml, there is the following section:
Is down_sample: 5 an absolute value, not a ratio? As far as I know, a common Adapter typically involves a dimension reduction like hidden dim -> hidden_dim * 1/4 -> hidden_dim. Your code has the structure (768, 5), GELU(), (5, 768), is there a specific reason for this setup, and why the value is specifically 5?