stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.87k stars 1.52k forks source link

Merging cooccurrence files: processed 0 lines. Unable to open file overflow_1021.bin. #155

Open summerZXH opened 5 years ago

summerZXH commented 5 years ago

$ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 15 < ./train_data/data.txt > cooccurrence.bin COUNTING COOCCURRENCES window size: 15 context: symmetric max product: 13752509 overflow length: 38028356 Reading vocab from file "vocab.txt"...loaded 7459765 words. Building lookup table...table contains 215584655 elements. Processed 7713292825 tokens. Writing cooccurrences to disk............1141 files in total. Merging cooccurrence files: processed 0 lines.Unable to open file overflow_1021.bin.

when I train glove with big dataset (46G) , I met the problem, anyone knows why?

ousou commented 4 years ago

This seems to be the same error as mentioned in pull request #138 . The issue is probably that there are too many files open at the same time, and can be fixed by increasing the amount of allowed files, for instance: ulimit -n 2048. See the PR for more info.

jairajrouth commented 4 years ago

Hello Guys, I am bit new in this GloVe topic. I am interested to see how this project works. I am using an online editor to run this project. After running the demo.sh script i can see the build folder is created and text8 and vocab.txt files are been created. I was next interested to see the cooccur tool. I ran below command and get this message, I was expecting a file with cooccurance of words from the vocab.txt which is passed. Am i doing anything wrong with the command here ? Please let me know. All suggestions are welcome.

@ousou & @summerZXH As you guys were also working on it do you have any idea what i am doing wrong ?

Thanks and Regards. Jai

./build/cooccur -vocab-file vocab.txt <------- Command COUNTING COOCCURRENCES <------- Message window size: 15 context: symmetric max product: 10485784 overflow length: 28521267 Reading vocab from file "vocab.txt"...loaded 71290 words. Building lookup table...table contains 75253375 elements. Processing token: 0