lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.79k stars 417 forks source link

paper for GLU Mult Bias? #275

Open TimS-ml opened 1 month ago

TimS-ml commented 1 month ago

Hi:

Is GLU's mult_bias originally from this paper? https://arxiv.org/pdf/2202.08906 It mentioned Add Bias and Mult Bias on page 32. I could not find the info in README, and I am not very sure.

Thanks! 😀

lucidrains commented 2 weeks ago

@TimS-ml hey Tim, yes that paper corroborates that technique, but i think it originated from another earlier work within google brain (could be wrong)

TimS-ml commented 2 weeks ago

@lucidrains Thanks! Should we add the original paper to the README? Let me try if I can find that paper. BTW, I have another small question - like right now the AttentionLayer takes more than 55 input parameters, and I'm guessing this number's only gonna go up as we implement more papers. Are there any software design patterns or something we could use to make the codebase easier to maintain? Like maybe grouping the input params further?

lucidrains commented 2 weeks ago

@TimS-ml it actually won't go up as fast as you think

very few techniques made it, and if anything, i will probably start removing certain ideas in coming releases

re: citation, yes, let us cite that if you can find it