rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.19k stars 465 forks source link

learn_bpe.py error #102

Open unwritten opened 3 years ago

unwritten commented 3 years ago

I am running with a very big file: about 150M lines, disk size 60GB, --num-workers 10, and then : 'vocab += pickle.load(f)' in learn_bpe.py will report error: EOFError: Ran out of input.

tested on windows 10 os. I assume the 'tmp = tempfile.NamedTemporaryFile' introduce this? anyone has such experience?

thx

rsennrich commented 3 years ago

thanks for reporting this issue.

@yimmon , this is related to parallel support you contributed; could you have a look?