mit-han-lab / offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model
https://arxiv.org/abs/2302.04870
MIT License
367 stars 38 forks source link

The authors should consider changing the term “adapter” to avoid potential confusion with adapter-tuning. #2

Open henryzhongsc opened 1 year ago

henryzhongsc commented 1 year ago

Great work with an elegant but effective idea! Thanks for sharing. However, I have a minor suggestion.

It is well-known that in the LLM finetuning paradigm, adapter-tuning [1] — done by inserting lightweight modules between transformer layers and only updating such modules upon downstream tasks — is a popular approach. In this work, the “adapters” the authors refer to are not such modules, but rather a selection of layers from the pertained model. The authors clearly know this term overlap, as there are even combo experiments on offsite-tuning + adapter-tuning (Table 5).

Given both approaches are within the realm of parameter-efficient finetuning. I’d encourage the authors to find an alternative term for your “adapter” to avoid potential confusion and ambiguities.

A couple of preliminary examples I can come up with are “bridging/pluggable/relay/alignment/shared + layers/units/components.” Hope it helps!

[1] Houlsby et al., Parameter-efficient transfer learning for NLP. ICML 2019.