The weight of embedding padding_idx=0 is not zero

microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.

MIT License

2.14k stars 337 forks source link

The weight of embedding padding_idx=0 is not zero #41

Closed lkfo415579 closed 2 years ago

lkfo415579 commented 2 years ago

https://github.com/microsoft/Graphormer/blob/740e6ff09a5de29d61def5ea6af7dfd04cee719e/graphormer/model.py#L20

When you re-initialize the weight of embedding, the weight of 0th index is also initialized by normal distribution, whose padding vector in the feature input will be non-zero. It should be wrong.

zhengsx commented 2 years ago

Good point. The padding token will be non-zero, but it won't affect the self-attention calculation since the padding attention bias (see here). If you have concern about the potential influence of future model usage, you could modify the initialization refer to here.

lkfo415579 commented 2 years ago

Yes. I am working on future model usage therefore I noticed the bug in here. when I modifying the edge feature, there may be 0 embedding index which exists in the edge feature before the range of mask. (e.g. no covalent bond between two atoms)

zhengsx commented 2 years ago

If I understand correctly, it might be solved by adding an index-shift.

lkfo415579 commented 2 years ago

If I understand correctly, it might be solved by adding an index-shift.

I don't understand what is index-shift, haha.

Good point. The padding token will be non-zero, but it won't affect the self-attention calculation since the padding attention bias (see here). If you have concern about the potential influence of future model usage, you could modify the initialization refer to here.

This method solved the problem anyway.

zhengsx commented 2 years ago

Close this issue due to inactive. Feel free to raise a new one or reopen this one for any further question.