kimiyoung / transformer-xl

Apache License 2.0
3.6k stars 762 forks source link

Penn Treebank and WikiText-2 architectures #35

Open AlexGrinch opened 5 years ago

AlexGrinch commented 5 years ago

Hello!

Could you, please, provide hyperparameters for training models with close to SOTA perplexity on PTB and WT2 (if you experimented with the latter, as it has the corresponding choice in data utils)? Am I right that two changes I need to make to the released code is to add variational dropout and ASGD optimizer? If you have a code which produces necessary changes, it would be great.

Thanks

SaoYear commented 4 years ago

Did you find hyperparams for PTB? I only reached 68 in test without variational dropout and weight averaging. But only with 14m params.