n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
282 stars 56 forks source link

Added an option to set the minimal limit of tokens per article #45

Closed cahya-wirawan closed 5 years ago

cahya-wirawan commented 5 years ago

Hi Piotr, As I understand, the create_wikitext.py script only use articles that have at least 100 tokens. This short patch gives a possibility to set this limit manually on command line argument, in this case I use -t or --tokens_min , by default it is still 100 as the original. Thanks.