stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

fix(shuffle): malloc array_size try-retry #140

Closed tmkontra closed 4 years ago

tmkontra commented 5 years ago

Changes:

Purpose: During development of a Cython wrapper for this library, I noticed segmentation faults when requesting too much memory via the -memory on the shuffle binary.

After much debugging to find the root cause, this commit had me implement a "self-healing" array_size. More advanced users may easily overcome this (by adjusting their -memory arg), but I recognize a strong segment of scientific users who may not be able to debug such an error.

I hope this is found useful. Please let me know if this is something you would consider merging, or what changes should be made (I'm rather new to C, so any feedback is appreciated). Cheers!