Initializing AttentionPool weights with 2 * Identity matrix, again!

lucidrains / enformer-pytorch

Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch

MIT License

434 stars 81 forks source link

Initializing AttentionPool weights with 2 * Identity matrix, again! #22

Closed dohlee closed 1 year ago

dohlee commented 1 year ago

Thank you for the quick fix for my recent issue (https://github.com/lucidrains/enformer-pytorch/issues/21)! nn.init.dirac_ is a great solution.

However there's one more thing: we have to initialize the weight with a Identity matrix multiplied by 2!

Of note, the official deepmind Sonnet implementation of enformer uses snt.initializers.Identity with gain=2. (Please refer code lines here and here in deepmind implementation)

Thanks, Dohoon

lucidrains commented 1 year ago

@dohlee oh yup, i later updated it here

probably should have just gone with your solution 😅

dohlee commented 1 year ago

Oh sorry I didn't notice that commit. No problem at all :)

I'll keep on reviewing the code. Really informative!

lucidrains commented 1 year ago

@dohlee ok, do send me an email if you get any great results with fine tuning!