murrayds / sci-mobility-emb

Embedding of scientific mobility across institutions, cities, regions, and countries
4 stars 0 forks source link

UMAP parameter search #14

Closed murrayds closed 5 years ago

murrayds commented 5 years ago

Plot UMAP with several parameter combinations, mostly focusing on:

jisungyoon commented 5 years ago

I think we can use the 'correlation' metric for UMAP. Because node2vec often use similarity between two nodes as just dot product. (especially, in link prediction task)

murrayds commented 5 years ago

I think we can use the 'correlation' metric for UMAP. Because node2vec often use similarity between two nodes as just dot product. (especially, in link prediction task)

I'll plan on running UMAP with both, to see the results. But could you elaborate? At the moment, I am not working with network-level data or node2vec, I am only working with word2vec—does it still make sense to use a dot product?

jisungyoon commented 5 years ago

According to their node2vec paper,

Overall, the Hadamard operator when used with node2vec is highly stable and gives the best performance on average across all networks.

If the result vector is normalized, Dot-product and cosine similarity are the same. I am not sure that they normalized the vector on their vectors, but there is no normalized step in their paper and implemented code.