lucidrains / h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
MIT License
155 stars 21 forks source link

Algorithm Mismatch #13

Closed jinmang2 closed 3 years ago

jinmang2 commented 3 years ago

Paper Implementation

In the implementation, we get blocked Q, K, V tensors by level with the code below.

https://github.com/lucidrains/h-transformer-1d/blob/110cab0038898560d72d460bfef8ca8b7f17f0a5/h_transformer_1d/h_transformer_1d.py#L164-L179

And return the final result of matrix-matrix product with Equation 29 or 69 with the for loop below.

https://github.com/lucidrains/h-transformer-1d/blob/110cab0038898560d72d460bfef8ca8b7f17f0a5/h_transformer_1d/h_transformer_1d.py#L234-L247

What is problem?

However, according to the current code, it is not possible to include information about the level 0 white blocks in the figure below. (Equation 70 of the paper includes the corresponding attention matrix entries.)

fig2

I think you should also add an off-diagonal term of near-interaction (level 0) to match Equation 69!

lucidrains commented 3 years ago

@jinmang2 Hi MyungHoon! I think you are right, thank you for catching this bug - I've released the changes in 0.1.6 https://github.com/lucidrains/h-transformer-1d/releases/tag/0.1.6 , do you want to review this and see if it resolve the issue you described?

jinmang2 commented 3 years ago

Thanks for solving it so quickly 😄

Thank you for the opportunity to review, I will check the 0.1.6 version code and leave a comment!

lucidrains commented 3 years ago

closing, since i think its fine now