Closed JamesDeAntonis closed 3 years ago
@JamesDeAntonis yup, i agree it usually should not have projections - but another user actually was trying to fine-tune a BERT implementation to use Performer, and upon closer examination of this public BERT model, there they did have bias for qkv projection 🤷♂️
I think I should probably default it to False, and allow people to turn it on. Regardless, it shouldn't hurt performance
Makes sense. Either way, could we at least make it a parameter? I could make the changes and submit a PR if that works for you
@JamesDeAntonis ok James! done in https://github.com/lucidrains/performer-pytorch/commit/b6c2bfdd4547bf658a310e91ffe0cb5fc0b71736
awesome! thanks!
I am referring to these lines
For reference, in huggingface, all these projections have no bias