stanfordnlp / GloVe

Software in C and data files for the popular GloVe model for distributed word representations, a.k.a. word vectors or embeddings
Apache License 2.0
6.86k stars 1.51k forks source link

vectors in W being updated (python) #47

Closed andreiamatuni closed 7 years ago

andreiamatuni commented 7 years ago

Are the vectors in W supposed to be updated depending on inputs passed to the distance function?

For example, if I pass in "car", the cosine similarity with "stroller" is 0.1729. If I then pass in "car stroller", and then just "car" again, then the cosine similarity with "stroller" is now 0.765.

Running the python code through a debugger, it looks like the vectors for "car" and "stroller" in W are updated during the distance function call with the input containing multiple words. Is this supposed to be happening?

from eval/python/distance.py:

    for idx, term in enumerate(input_term.split(' ')):
        if term in vocab:
            print('Word: %s  Position in vocabulary: %i' % (term, vocab[term]))
            if idx == 0:
                vec_result = W[vocab[term], :]
            else:
                vec_result += W[vocab[term], :]

when you initialize the first vec_result, and then add to it in the else: branch, you're updating the first vector in the W ndarray itself since vec_result is a reference (not a copy).

ghost commented 7 years ago

In the code snippet you have pasted, W is not being updated. Rather vec_result is being set to the sum of it's word vectors. Would you agree?

andreiamatuni commented 7 years ago

vec_result is a reference to an element within W. When you update vec_result by summing the vectors, you're updating the first vector within W (at index vocab[term]) since that's what the reference points to. If vec_result was assigned as a copy, then what you said would be true, but as is, it's a basic reference assignment.

ghost commented 7 years ago

Ah, good catch! This was a user contribution, and unfortunately we didn't catch that bug. Fix should be here: https://github.com/stanfordnlp/GloVe/commit/c0d838f86c4d14c7ea9af647ca869291058ba8c0

Does that work?

andreiamatuni commented 7 years ago

yup, thanks!