Subject: Adapter dimension

gqk / LAE

A Unified Continual Learning Framework with General Parameter-Efficient Tuning, ICCV 2023 [PyTorch Code]

Apache License 2.0

69 stars 3 forks source link

extends: - ./base/cifar100_order1.yaml module: model: backbone: ViT-B_16 adapt_blocks: [0, 1, 2, 3, 4] pet_cls: Adapter pet_kwargs: down_sample: 5 mode: parallel scale: null

Hi @jong980812 ,

Thank you for your interest in our work. I apologize for the delay in responding.

The choice of adapter hidden dimension values often requires a comprehensive selection based on the pre-trained model, task, and data scale. The values we chose in our paper were primarily to facilitate a fair comparison with other approaches. If you have insights or have come across research in this area, I would be very grateful if you could share it with me, as I am always eager to learn more.
Regarding the use of the GELU activation function, it was actually a minor oversight in the implementation process. In practice, we found that it almost makes no noticeable difference compared to using ReLU.

Please feel free to reach out if you have further questions or if there's anything else I can assist you with.

Best regards,

Qiankun Gao

gqk / LAE

Subject: Adapter dimension #5