gqk / LAE

A Unified Continual Learning Framework with General Parameter-Efficient Tuning, ICCV 2023 [PyTorch Code]
https://arxiv.org/abs/2303.10070
Apache License 2.0
69 stars 3 forks source link

Subject: Adapter dimension #5

Closed jong980812 closed 9 months ago

jong980812 commented 11 months ago

I appreciate your good work and thank you for sharing your excellent code. While going through the code, I had a question. In the vit_adapter.yaml, there is the following section:

extends:
  - ./base/cifar100_order1.yaml
module:
  model:
    backbone: ViT-B_16
  adapt_blocks: [0, 1, 2, 3, 4]
  pet_cls: Adapter
  pet_kwargs:
    down_sample: 5
    mode: parallel
    scale: null

Is down_sample: 5 an absolute value, not a ratio? As far as I know, a common Adapter typically involves a dimension reduction like hidden dim -> hidden_dim * 1/4 -> hidden_dim. Your code has the structure (768, 5), GELU(), (5, 768), is there a specific reason for this setup, and why the value is specifically 5?

gqk commented 10 months ago

Hi @jong980812 ,

Thank you for your interest in our work. I apologize for the delay in responding.

  1. The choice of adapter hidden dimension values often requires a comprehensive selection based on the pre-trained model, task, and data scale. The values we chose in our paper were primarily to facilitate a fair comparison with other approaches. If you have insights or have come across research in this area, I would be very grateful if you could share it with me, as I am always eager to learn more.

  2. Regarding the use of the GELU activation function, it was actually a minor oversight in the implementation process. In practice, we found that it almost makes no noticeable difference compared to using ReLU.

Please feel free to reach out if you have further questions or if there's anything else I can assist you with.

Best regards,

Qiankun Gao