Closed dixiematt8 closed 6 years ago
I think the code most closely replicating the paper is the oldest available release (this is the implementation by the Google Brain Team). If you want a more readable (for study purposes), but still working implementation of the Transformer model, see e.g. https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/encoders/transformer.py (this is not by the Google Brain team).
Indeed, this is the code the original paper was based on. You only need to read the transformer.py file (https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py) and the common_attention function it references. Other than that, you can train on your own by hand, as illustrated in the colab (goo.gl/wkHexj).
We're trying to make T2T easier to understand and hack by hand: if you have suggestions on how to do it, e.g., what would help you with understanding the Transformer code, please reopen this issue and let us know!
Recently, i am study bert and back to learn "attention is all you need" but can't find the implementation by google team, that's a pitty and hope google can release the code for this great work !!!
I think you did not understand: this is the original implementation and we are the Google team :)
@lukaszkaiser Have there been any efforts to make t2t based translation models work with initialization provided by BERT at both encoder and decoder side?
Yes, but we found that just multi-training LMs (like OpenAI) leads to better transfer learning. The paper is submitted to NAACL which doesn't allow to put it on arxiv, unluckily, sorry about that.
I am having a hard time extracting the code from tensor2tensor.