Open JohannesTK opened 5 years ago
Glad that you like our work.
Our codebase is not compatible with Tensor2Tensor at this point. There are two reasons: 1) The computational graph we built in Transformer-XL contains components that are not part of the standard Transformer in Tensor2Tensor, including the recurrence mechanism and the new relative positional encodings. 2) The scope names used in Tensor2Tensor are different from ours.
Therefore it is not possible to load a Transformer-XL checkpoint by simply modifying hyperparameters in Tensor2Tensor. PRs on compatibility with Tensor2Tensor are more than welcome.
Thanks for clearing it up.
Any plans for upcoming compatibility?
Thank you for such easy to read code & repo - can be seen that a lot of hard work has gone into it! Secondly, found your work from Sebastian Ruder NLP newsletter and as he put it as: "Peer review is an imprecise process and gems may sometimes fall through the cracks." Your work was under one of the gems and I totally agree!
Now specifically, I tried using wt103 in Tensor2Tensor and I'm getting an error of:
I suppose it comes from the wrong hparams I am using?
Tensor2Tensor transformer hparams