stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

fix input and division errors, add optional vocabulary file generation #141

Closed crystal-butler closed 4 years ago

crystal-butler commented 5 years ago

Running distance.py under Python 3.6 results in a divide-by-zero error, and an error for the raw_input command, which was renamed input() under PEP 3111 (https://docs.python.org/3/whatsnew/3.0.html). The proposed changes fix these issues.

Because the pre-trained GloVe vector sets don't come with vocabulary files, I also added the option to automatically generate a vocabulary file from the input vectors on-the-fly. The vocabulary file does not get saved out, only used while the script is running.

AngledLuffa commented 4 years ago

I merged the raw_input change. Thanks.

I'm pretty sure the vocab file generation is not working correctly as per my earlier comment. If you want to fix that, please feel free to do so and resubmit the pull request.