neulab / awesome-align

A neural word aligner based on multilingual BERT
https://arxiv.org/abs/2101.08231
BSD 3-Clause "New" or "Revised" License
325 stars 47 forks source link

added multiprocessing to LineByLineTextDataset class #35

Closed vigneshmj1997 closed 3 months ago

vigneshmj1997 commented 2 years ago

Added multiprocessing to LineByLineTextDataset class since tokenizer.prepare_for_model takes lot to time to process for large datasets