mkusner / wmd

Word Mover's Distance from Matthew J Kusner's paper "From Word Embeddings to Document Distances"
537 stars 132 forks source link

problem with WCD #26

Open 08s011003 opened 5 years ago

08s011003 commented 5 years ago

@mkusner I read your paper and want to use your WCD+RWMD method to calculate docs similarity in my doc recommendation project. I found the code for RWMD in matlab, but didn't find the code for WCD. Is it the file named distance.m?

mkusner commented 5 years ago

Ah sorry for being unclear: the WCD is the distance you get from first representing documents as the weighted average of the word vectors in a document where the weights are the normalized BOW weights, and then computing the Euclidean distance. So for instance, to compute the WCD in python you would do:

load data

with open(load_file) as f: [X, BOW_X, y, C, words] = pickle.load(f)

compute WCD between documents i and j

--------------------------------------------------------

normalize BOW for document i, j

bow_i = BOW_X[i] bow_i = bow_i / np.sum(bow_i) bow_j = BOW_X[j] bow_j = bow_j / np.sum(bow_j)

I haven't debugged the code below, may need to add an additional

dimension to bow_i, bow_j (i.e., make (n,1) instead of (n,)) v_i = np.dot(X[i].T, bow_i) v_j = np.dot(X[j].T, bow_j)

Euclidean distance (can use distance.m to parallelize this in matlab,

similar functions exist in python as well) wcd_ij = np.sqrt( np.sum( (v_i - v_j)**2 ) )

Does this make sense?

On Mon, Nov 19, 2018 at 11:50 PM 08s011003 notifications@github.com wrote:

@mkusner https://github.com/mkusner I read your paper and want to use your WCD+RWMD method to calculate docs similarity in my doc recommendation project. I found the code for RWMD in matlab, but didn't find the code for WCD. Is it the file named distance.m?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mkusner/wmd/issues/26, or mute the thread https://github.com/notifications/unsubscribe-auth/AIJS0Ww3Vkq0nDnL5MVbII04cyKCUAGEks5uw0OvgaJpZM4YqHEA .

08s011003 commented 5 years ago

@mkusner Thank you very much. But I have another question in "v_i = np.dot(X[i].T, bow_i) v_j = np.dot(X[j].T, bow_j)". You didn't save the variabe T in load_file. So are you sure it needs to make operation "X[j].T" here?

mkusner commented 5 years ago

Yes, X[i].T is the transpose of X[i], not another variable.

On Thu, Nov 22, 2018, 4:29 AM 08s011003 <notifications@github.com wrote:

@mkusner https://github.com/mkusner Thank you very much. But I have another question in "v_i = np.dot(X[i].T, bow_i) v_j = np.dot(X[j].T, bow_j)". You didn't save the variabe T in load_file. So are you sure it needs to make operation "X[j].T" here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mkusner/wmd/issues/26#issuecomment-440909553, or mute the thread https://github.com/notifications/unsubscribe-auth/AIJS0bGo24OGMM6tq9kZzWJ0fCeAGKKuks5uxighgaJpZM4YqHEA .