Universal Transformer as base architecture

openai / finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

MIT License

2.15k stars 503 forks source link

Universal Transformer as base architecture #23

Open rodgzilla opened 6 years ago

rodgzilla commented 6 years ago

Hello,

First, I would like to thank the authors of this paper for releasing their source code.

Is there a plan to use the same approach using a Universal Transformer as base architecture? Would the adaptive computation time (ACT) mechanism transfer to other tasks?

And more importantly, if this new transformer can be used, do you think the gain would be noticeable?