stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Improve error logging when file loading fails #138

Closed ousou closed 4 years ago

ousou commented 5 years ago

After this change whenever a file load fails the process will log the errno and the error description set by fopen.

As an example, instead of the error message:

Writing cooccurrences to disk............1523 files in total. Merging cooccurrence files: processed 0 lines.Unable to open file overflow_1021.bin.

the process may now outputs the following:

Merging cooccurrence files: processed 0 lines.Unable to open file overflow_1021.bin. Errno: 24 Error description: Too many open files

Some background to this PR: We actually encountered the error above when creating vectors for a large corpora (about 80 billion tokens). The issue was that the cooccur process tried to open too many files during the merge_files phase, and thus the process crashed. We solved that issue by increasing the amount of allowed open files by using the following command:

ulimit -n 2048

This solved the issue for us since we had about 1500 overflow files. The default limit for open files for a single process in Ubuntu seems to be 1024.

AngledLuffa commented 4 years ago

Thank you for sending this. I will merge the change which refactors common code into a single file later today. If you can rebase off that, that would be excellent. If not, I can redo this change to use the common code files myself.

AngledLuffa commented 4 years ago

I refactored this as desired and merged it, giving credit to you. Thanks!

ousou commented 4 years ago

Great, thanks for taking care of the refactoring!