uhh-lt / sensegram

Making sense embedding out of word embeddings using graph-based word sense induction
http://uhh-lt.github.io/sensegram
212 stars 50 forks source link

why using dists * np.log(freq) in the previous sensegram? #16

Closed AsmaZbt closed 6 years ago

AsmaZbt commented 6 years ago

I'm note sur if i can ask you here , about the previous version of sensegram , so excuse me if it's not the right place here, I think this new version is so advanced for me, so I prefer strating from the beginning

in the function : similar_top_opt3 (...) when you have compute the similarty between the arrays of the distances ( dists = np.dot(vec, vec.syn0norm.T) and the array of the frequencies like this :

vecs = vec.syn0norm[indices] dists = np.dot(vecs, vec.syn0norm.T)

if freq is not None:
    dists = dists * np.log(freq)

I do not understand why you have multiplied the distance with the log of frequencies? can you explain to me please

alexanderpanchenko commented 6 years ago

sorry for the late reply, can you please provide a url to the line with this line of code?

AsmaZbt commented 6 years ago

Hello! no problem , thank you for the replying ^_^ this is the function of the previous version :

def similar_top_opt3(vec, words, topn=200, nthreads=12, freq=None): vec.init_sims()

indices = [vec.vocab[w].index for w in words if w in vec.vocab]
vecs = vec.syn0norm[indices]
dists = np.dot(vecs, vec.syn0norm.T)

if freq is not None:
    dists = dists * np.log(freq)

if nthreads==1:
    res = dists2neighbours(vec, dists, indices, topn)
else:
    batchsize = int(ceil(1. * len(indices) / nthreads))
    print >> stderr, "dists2neighbours for %d words in %d threads, batchsize=%d" % (len(indices), nthreads, batchsize)
    def ppp(i):
        return dists2neighbours(vec, dists[i:i+batchsize], indices[i:i+batchsize], topn)
    lres = parallel_map(ppp, range(0,len(indices),batchsize), threads=nthreads)
    res = OrderedDict()
    for lr in lres:
        res.update(lr)

return res

thank you so much (y)

alexanderpanchenko commented 6 years ago

Sorry for a so late answer: this part of the code was provided by a contributor and I was not sure about it. In fact, this code is currently removed from the repository because we now use FAISS facebook library for computing nearest neighbors instead of this numpy code.