Indeed, this will not affect the performance of Enformer loaded with pretrained parameters, but I think it may lead to slightly worse (according to the paper) performance when trained from scratch.
Perhaps some simple manual weight initialization like
Hi, thanks for this great implementation. I'm learning a lot from it :)
I noticed that the commit https://github.com/lucidrains/enformer-pytorch/commit/0dc46e41de96bd739edba2cfaaa5e123990e9bc7 makes the internal
AttentionPool
weight initialized with randomly-sampled values rather than 2 * Identity-matrix, which is specified in the Enformer paper. (If there's something I am missing, please let me know!)Indeed, this will not affect the performance of Enformer loaded with pretrained parameters, but I think it may lead to slightly worse (according to the paper) performance when trained from scratch.
Perhaps some simple manual weight initialization like
will do.
If you think it'll be okay, please let me know then I'll open a PR right away.
Thanks again for this great repo!
Best, Dohoon