Closed hank0316 closed 4 months ago
Hello Hank,
The representation $u_t$ is of a token, so the routing is done at the token level. I hope that clarifies any misunderstanding. Let me know if you have any questions.
The information you provided clarified things for me. Thanks for the response!
Dear Authors,
I am a MS student at National Taiwan University and have recently engaged with your paper. The concept of 'token-wise' routing within this framework has captured my interest, but I find myself needing further clarification to fully grasp its implementation.
The paper specifies that for each input representation $u_t \in \mathbb R^n$ to the frozen LoRA, a gating function applies, resulting in output representations $Wu_t + BA\sigma(v^\text{T}ut)$ during the training of the gating vector $v$. During inference, the affinity $\alpha{t,z}$ between PEFT module $z$ and input $u_t$ is computed as $\bar{v}^{\text{T}}\bar{u}_t$. The selection of top-$k$ experts is then based on the affinity scores of each expert. After obtaining the $k$ experts, a softmax function is adopted to determine the weight for each expert's output.
My interpretation of 'token-wise' routing was that it would route each token to different experts. However, the described process seems to suggest routing at an example-level rather than at the token-level. Is $u_t$ a representation of a token rather than the whole input sequence? Could you please clarify if my understanding is correct or if there's a nuance I'm missing?
Thank you for your time and assistance.
Sincerely, Hank