lucidrains / performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch
MIT License
1.08k stars 141 forks source link

why is bias true in `to_<q,k,v>`? #63

Closed JamesDeAntonis closed 3 years ago

JamesDeAntonis commented 3 years ago

I am referring to these lines

For reference, in huggingface, all these projections have no bias

lucidrains commented 3 years ago

@JamesDeAntonis yup, i agree it usually should not have projections - but another user actually was trying to fine-tune a BERT implementation to use Performer, and upon closer examination of this public BERT model, there they did have bias for qkv projection 🤷‍♂️

I think I should probably default it to False, and allow people to turn it on. Regardless, it shouldn't hurt performance

JamesDeAntonis commented 3 years ago

Makes sense. Either way, could we at least make it a parameter? I could make the changes and submit a PR if that works for you

lucidrains commented 3 years ago

@JamesDeAntonis ok James! done in https://github.com/lucidrains/performer-pytorch/commit/b6c2bfdd4547bf658a310e91ffe0cb5fc0b71736

JamesDeAntonis commented 3 years ago

awesome! thanks!