stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

Added the option to not weight cooccurrence counts by word distance #89

Closed shreyshahi closed 6 years ago

shreyshahi commented 7 years ago

I am using GloVe in a context where the order of the tokens and distances between tokens in the context window don't have the same meaning as in most language modeling tasks. Just counting the unweighted cooccurrences leads to more meaningful vectors in some contexts. In the proposed change I am adding an argument in cooccur.c to take a flag to ignore distance between words when counting coocurrences.

The default behavior remains the same as before, so an explicit flag needs to be passed to switch on the new behavior.

manning commented 6 years ago

Looks good!