lucidrains / h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
MIT License
155 stars 21 forks source link

Billion Word Benchmark - Reproducibility #24

Open DavidHerel opened 1 year ago

DavidHerel commented 1 year ago

Hi there,

how can I reproduce the results from the paper on Billion Word Benchmark. Could you please provide already pertained model with evaluation or script on how to train it?

Thanks