Closed manueltonneau closed 4 years ago
Well your system just ran out of memory. 115Go is pretty large for a text file.
Does your test file include real text (with spaces and sentences) ? It will use less memory if your text file is quite redundant in terms of words/pairs. Just guessing here I'm not from hugging face.
Exactly, it includes one normal sentence per line. You're right, I was running another process on the side which didn't help. Rerunning it now and it seems to work. Thanks a lot :)
Out of curiosity, what kind of RAM did you have on that machine?
Hi all,
Thanks for this great contribution :)
I was using the module to build a WordPiece vocab, using a very big txt file as input (115GB). Loading the data and tokenizing the words worked fine. As the pairs couting was going on, I got this error
memory allocation of 150994960 bytes failedAborted
. Any idea why this could happen?Thanks in advance!