monarch-initiative / embiggen

🍇 Embiggen is the Python Graph Representation learning, Prediction and Evaluation submodule of the GRAPE library.
BSD 3-Clause "New" or "Revised" License
41 stars 12 forks source link

Utility functions? #105

Closed pnrobinson closed 4 years ago

pnrobinson commented 4 years ago

Should we add a file with various utility functions for working with the embeddings? For instance, something like this to get the most similar words?

def get_cosine_sim(emb, valid_words, top_k):
    norm = np.sqrt(np.sum(emb**2,axis=1,keepdims=True))
    norm_emb = emb/norm
    in_emb = norm_emb[valid_words,:]
    similarity = np.dot(in_emb, np.transpose(norm_emb))
    sorted_ind = np.argsort(-similarity, axis=1)[:,1:top_k+1]
    return sorted_ind, valid_words
callahantiff commented 4 years ago

@pnrobinson - great idea! I'm happy to work on this. I know there are some embedding loading and writing functions in the scripts we could also pull out and add to a utility embedding script.

Will add a new branch for this off of develop.

justaddcoffee commented 4 years ago

+1 @callahantiff !

vidarmehr commented 4 years ago

@pnrobinson @callahantiff I think we may need adding Common Neighbors, Jaccard’s Coefficient, Adamic-Adar Score, Preferential Attachment scores, too. We don't need embedding to calculate these scores. But, I think we eventually need to calculate these scores and compare with node2vec results.

callahantiff commented 4 years ago

@pnrobinson @callahantiff I think we may need adding Common Neighbors, Jaccard’s Coefficient, Adamic-Adar Score, Preferential Attachment scores, too. We don't need embedding to calculate these scores. But, I think we eventually need to calculate these scores.

@vidarmehr, great point! Maybe we create a utility function for link prediction methods as well then?

vidarmehr commented 4 years ago

@pnrobinson @callahantiff I think we may need adding Common Neighbors, Jaccard’s Coefficient, Adamic-Adar Score, Preferential Attachment scores, too. We don't need embedding to calculate these scores. But, I think we eventually need to calculate these scores.

@vidarmehr, great point! Maybe we create a utility function for link prediction methods as well then?

Yes. I think that is a good idea to have a separate utility function for link prediction methods.

pnrobinson commented 4 years ago

Awesome! We should make these functions so that they are easily testable!

callahantiff commented 4 years ago

Awesome! We should make these functions so that they are easily testable!

Absolutely! I will also open a second ticket for link prediction function metric utils.

callahantiff commented 4 years ago

@pnrobinson - perhaps we just want to use the tf built in cosine similarity function?

https://www.tensorflow.org/api_docs/python/tf/keras/metrics/CosineSimilarity

vidarmehr commented 4 years ago

@pnrobinson - perhaps we just want to use the tf built in cosine similarity function?

https://www.tensorflow.org/api_docs/python/tf/keras/metrics/CosineSimilarity

I wrote a simple code to calculate cosine similarity. But, maybe it is good to use the tf implementation, right?

pnrobinson commented 4 years ago

As a rule it is almost always better to use library functions for things like this.

pnrobinson commented 4 years ago

@vidarmehr can we close this now?

vidarmehr commented 4 years ago

@vidarmehr can we close this now?

Yes, Peter. I close it.