Great work with an elegant but effective idea! Thanks for sharing. However, I have a minor suggestion.
It is well-known that in the LLM finetuning paradigm, adapter-tuning [1] — done by inserting lightweight modules between transformer layers and only updating such modules upon downstream tasks — is a popular approach. In this work, the “adapters” the authors refer to are not such modules, but rather a selection of layers from the pertained model. The authors clearly know this term overlap, as there are even combo experiments on offsite-tuning + adapter-tuning (Table 5).
Given both approaches are within the realm of parameter-efficient finetuning. I’d encourage the authors to find an alternative term for your “adapter” to avoid potential confusion and ambiguities.
A couple of preliminary examples I can come up with are “bridging/pluggable/relay/alignment/shared + layers/units/components.” Hope it helps!
[1] Houlsby et al., Parameter-efficient transfer learning for NLP. ICML 2019.
Great work with an elegant but effective idea! Thanks for sharing. However, I have a minor suggestion.
It is well-known that in the LLM finetuning paradigm, adapter-tuning [1] — done by inserting lightweight modules between transformer layers and only updating such modules upon downstream tasks — is a popular approach. In this work, the “adapters” the authors refer to are not such modules, but rather a selection of layers from the pertained model. The authors clearly know this term overlap, as there are even combo experiments on offsite-tuning + adapter-tuning (Table 5).
Given both approaches are within the realm of parameter-efficient finetuning. I’d encourage the authors to find an alternative term for your “adapter” to avoid potential confusion and ambiguities.
A couple of preliminary examples I can come up with are “bridging/pluggable/relay/alignment/shared + layers/units/components.” Hope it helps!
[1] Houlsby et al., Parameter-efficient transfer learning for NLP. ICML 2019.