Open TimS-ml opened 1 month ago
@TimS-ml hey Tim, yes that paper corroborates that technique, but i think it originated from another earlier work within google brain (could be wrong)
@lucidrains Thanks! Should we add the original paper to the README? Let me try if I can find that paper. BTW, I have another small question - like right now the AttentionLayer takes more than 55 input parameters, and I'm guessing this number's only gonna go up as we implement more papers. Are there any software design patterns or something we could use to make the codebase easier to maintain? Like maybe grouping the input params further?
@TimS-ml it actually won't go up as fast as you think
very few techniques made it, and if anything, i will probably start removing certain ideas in coming releases
re: citation, yes, let us cite that if you can find it
Hi:
Is GLU's mult_bias originally from this paper? https://arxiv.org/pdf/2202.08906 It mentioned Add Bias and Mult Bias on page 32. I could not find the info in README, and I am not very sure.
Thanks! 😀