Code of your "Learning Hard Retrieval Decoder Attention for Transformers" paper

hfxunlp / transformer

Neutron: A pytorch based implementation of Transformer and its variants.

https://github.com/hfxunlp/transformer

GNU General Public License v3.0

63 stars 9 forks source link

Code of your "Learning Hard Retrieval Decoder Attention for Transformers" paper #3

Closed HaisongDing closed 2 years ago

HaisongDing commented 2 years ago

Hi Hongfei, Does this repo also contain the implementation of your "Learning Hard Retrieval Decoder Attention for Transformers" paper? If not, will it be released? Based on my understanding, the "hard retrieval" is achieved by replacing P with P'=Multinomial Sampling(P), P=(P'-P).detach()+P. Please kindly correct me if I am wrong.

hfxunlp commented 2 years ago

@HaisongDing Hi Haisong, many thanks for your attention. Please find the implementation of the attention modules here, and the retrieval function here. These python code exactly implements the equations in the paper for efficiency, but yours looks simpler and more clever. Besides, it is suggested to implement the training and inference (argmax+indexing) differently, as this benefits the inference performance. Best, Hongfei

HaisongDing commented 2 years ago

Thanks for your quick response.