Is there standalone code for Attention is all you need that was written by Google Brain Team?

tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Apache License 2.0

15.5k stars 3.49k forks source link

Is there standalone code for Attention is all you need that was written by Google Brain Team? #573

Closed dixiematt8 closed 6 years ago

dixiematt8 commented 6 years ago

I am having a hard time extracting the code from tensor2tensor.

martinpopel commented 6 years ago

I think the code most closely replicating the paper is the oldest available release (this is the implementation by the Google Brain Team). If you want a more readable (for study purposes), but still working implementation of the Transformer model, see e.g. https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/encoders/transformer.py (this is not by the Google Brain team).

lukaszkaiser commented 6 years ago

Indeed, this is the code the original paper was based on. You only need to read the transformer.py file (https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py) and the common_attention function it references. Other than that, you can train on your own by hand, as illustrated in the colab (goo.gl/wkHexj).

We're trying to make T2T easier to understand and hack by hand: if you have suggestions on how to do it, e.g., what would help you with understanding the Transformer code, please reopen this issue and let us know!

hwaking commented 5 years ago

Recently, i am study bert and back to learn "attention is all you need" but can't find the implementation by google team, that's a pitty and hope google can release the code for this great work !!!

lukaszkaiser commented 5 years ago

I think you did not understand: this is the original implementation and we are the Google team :)

awasthiabhijeet commented 5 years ago

@lukaszkaiser Have there been any efforts to make t2t based translation models work with initialization provided by BERT at both encoder and decoder side?

lukaszkaiser commented 5 years ago

Yes, but we found that just multi-training LMs (like OpenAI) leads to better transfer learning. The paper is submitted to NAACL which doesn't allow to put it on arxiv, unluckily, sorry about that.