A small issue in layer_norm

lbcb-sci / RiNALMo

RiboNucleic Acid (RNA) Language Model

https://sikic-lab.github.io/

Apache License 2.0

43 stars 6 forks source link

A small issue in layer_norm #9

Closed ZhiyuanChen closed 2 weeks ago

ZhiyuanChen commented 1 month ago

Hey,

Thank you for your great work, it looks awesome.

When I tried to reproduce your work, I noticed a tiny issue:

In line 122-128 of module, you used the layer norm output of hidden states as residual path of attention, where in standard transformer implementation, the hidden states before layer norm is used as residual path.

There shouldn't be a big problem, but if anyone is unable to reproduce the claimed result, this may be the cause.

RJPenic commented 2 weeks ago

Hello, thank you for your kind words!

Placement of layer norm you mentioned is indeed a slight oversight but I would advise against moving the layer norm if you decide to use our pretrained weights as we trained the model with the current implementation. I cannot guarantee how well (or bad) the weights would perform with the slightly different architecture.

ZhiyuanChen commented 2 weeks ago

Thank you for your clarification.

Yes, I kept your original implementation for consistency.