Closed oezyurty closed 8 months ago
Hi, @oezyurty . Thank you for your interest in our work.
To clarify:
We appreciate your feedback and will aim to clarify this further in our final manuscript!
Hi @jzhoubu, thanks a lot for the swift reply!
Thanks for the clarification. Yes, now I saw in 3.3 (a) that you state "we fully activate $V_p(p)$ while sparsify $V_q(q)$". Honestly, I couldn't directly associate this statement with the gating functions, my mistake :)
I think such clarification would definitely help your readers in the future.
Best regards,
Hi,
First of all, big congrats for the paper, it's really fun to read!
I have a minor question regarding how the similarity function is implemented in general.
Eq. (1) in your paper (Sec. 2.2) states that the gating function is applied to both sides, i.e. query and target. However, at Sec 3.2, you wrote that SCE loss is applied to $V_q(q) \odot G_q(q)$ and $V_p(p)$, which is confirmed by the pseudocode in Fig 2.
May I kindly ask your clarification about it?
Thanks in advance!