murrayds / sci-mobility-emb

Embedding of scientific mobility across institutions, cities, regions, and countries
4 stars 0 forks source link

Mathematical explanation of why gravity model holds? #29

Closed jisungyoon closed 4 years ago

jisungyoon commented 4 years ago

I found some very weak clue that why our model works. I am not good at mathematics, tell your thoughts about this results @yy @murrayds

I think it is related to the objective function of the word2vec model but is a little bit tricky. Because, in the word2vec model, they increase dot products between words, not exactly cosine similarity. But with cosine similarity, these equations hold

KakaoTalk_Photo_2019-11-26-13-58-13

Equations said that there is an inverse relation between M_i and sum of neighbor's sum of exponential cosine similarity

jisungyoon commented 4 years ago

Maybe, I think that is from the hub nodes of networks, and I briefly take a look from real embedding. Given node i, I collected the most similar 100 nodes, and calculate the sum of neighbor's sum of exponential cosine similarity of each node on the most similar 100 nodes. And get an average of 100 values as y, Appearance on node i as x value.

jisungyoon commented 4 years ago

with_std with std

jisungyoon commented 4 years ago

with_sem with standard error

jisungyoon commented 4 years ago

I think this effect makes the gravity model holds, but I am not sure with this equation. What's your thought?

jisungyoon commented 4 years ago

And this approximation only holds on window size with 1

yy commented 4 years ago

Sorry, I can't follow... could you define all symbols and provide more explanations?

jisungyoon commented 4 years ago

F_ij is a flow between node i and j, and M_i is the sum of flow from node i. Cos(i,j) means cosine similarity between embedding vector i and j.

jisungyoon commented 4 years ago

I wrote down a condition when does the gravity rule holds with the objective function of Word2vec.

jisungyoon commented 4 years ago

Or, do you have a time on zoom? It is a complicated problem to talk about here.

yy commented 4 years ago

Yeah maybe tomorrow sometime. But how does the in- and out-vector work here?

jisungyoon commented 4 years ago

Yeah maybe tomorrow sometime. But how does the in- and out-vector work here?

Actually, I assumed that w_in == w_out, because of dividing two matrix makes a calculation difficult. If embeddings are good enough w_in ~~ w_out. When can you have a meeting on zoom?

jisungyoon commented 4 years ago

let_s_do_math.pdf please check this pdf, and let's discuss about this issue

murrayds commented 4 years ago

I've read through it but I think I need more time to think about it. Could you upload the latex source code to the gitub? (make it a subfolder in the paper directory). I will flesh it out as I work through the math.

jisungyoon commented 4 years ago

I've read through it but I think I need more time to think about it. Could you upload the latex source code to the gitub? (make it a subfolder in the paper directory). I will flesh it out as I work through the math.

Or can we use overleaf for this latex file?

murrayds commented 4 years ago

Yeah, overleaf is good for me!

Either share a link or add me as a collaborator with dakota.s.murray@gmail.com

murrayds commented 4 years ago

Since this has now been put into the draft, lets close this issue