stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Fixed misleading README.md about inserting "dummy" words between documents in corpus #110

Closed jbojar closed 6 years ago

jbojar commented 6 years ago

GloVe supports documents separated by new line characters (\n) in corpus. Inserting multiple "dummy" words to separate documents is not necessary.

Detecting line feeds (\n) is done in function get_word in cooccur.c.

manning commented 6 years ago

Thanks. Will merge, but the description should still perhaps be a little clearer. E.g., tab works too.