Closed murrayds closed 3 years ago
I think @jisungyoon tried it? Just wanted to know where it ended up.
I believe we already have tested two versions of direct optimization of the gravity model, namely Levy's matrix factorization and word2vec with the negative sampling exponent = 1. Both methods optimize the gravity model with the distance and mass being the dot similarity and stationary distribution, respectively. The difference is that one does so using the stochastic gradient while the other using SVD.
@jisungyoon showed that both approaches perform comparably or less than the word2vec with the original parameter setting. These points are not clearly written in the manuscript so would worthwhile to write a paragraph on this in the math (or discussion?). In addition to this, should we show something in addition to performance like the same visualization of the embedding? I feel that repeating the whole analysis is optional.
@skojaku I think what YY means is that, at present, we are learning gravity law relationships using word2vec, which is true because word2vec is a gravity model. Similarly, we argue that Levy's factorization is equivalent to a gravity law, because Levy's factorization is equivalent to word2vec, which in turn is equivalent to a gravity model.
So currently, we have two embeddings:
I think what YY is asking is if we get rid of word2vec and Levy's factorization entirely. That is, why should we approximate something that is equivalent to a gravity law, when we can just optimize for the gravity law directly using something like CMDS? I however don't know how feasible this is.
@yy is this closer to what you were thinking?
Yeah. essentailly like force-directed layout where the length of springs are determined by the flow according to the gravity law. Does it make sense?
Let me check my understanding to the question.
The gravity model is a quite general model and we showed the equivalent between the word2vec and a specific gravity model with distance being dot similarity and mass being a stationary distribution. A question we want to answer is if we fit another gravity model that has a different definition of distance and mass, does it explain the trajectory better than the gravity model equivalent to the word2vec?
No it's a simpler question. Assuming a gravity equation, given a measured flux between two places and their mass, we can calculate the expected distance. We assign this distance to every pair of location, obtaining the distance matrix. Then we can simply try to optimize their embedding (MDS). What would this produce?
we can calculate the expected distance
If we calculate an expected distance, why would we do the embedding? We'd be using an expected distance to learn an embedding that approximates the original expected distance?
yeah that's what I'm trying to get at here. If we think naïvely, why not fix the gravity law first? We can measure the flux and masses, then we have the expected pairwise distance. Why not try to embed everything based on this distance? Why should we do the word2vec or any other approaches? Can we easily answer this question?
First, we tried two matrix factorization way of embedding
More precisely, both measures are not "word2vec" approach, it is just matrix factorization way.
Essentially, we can also factorize gravity mastrix (x_ij = t_ij/(M_i*M_j)), then it would be embedding with fixing gravity law first?
Here is the result. FIrst, I constructed gravity matrix which is A_ij = T_ij / (N_i * N_j)
With this matrix, tried two approaches.
truncated SVD on matrix with d=300 Best result is 0.11 with cosine similarity
Apply MDS with d=300, Best result is 0.35 with euclidian distance.
So, the conclusion is, even if we directly optimize gravity law, the result is not that good. I think the main issue is optimization algorithms.
thought? @yy @murrayds @skojaku
btw, 0.35 is the most highest performance among the baseline.
Cool!
Yea, it's surprising that MDS can be that good but not as good as the word2vec. Word2Vec is more advanced in terms of the optimization process, and there may be some engineering gap between wor2vec and MDS.
Yeah, I finally stabilize the codes!
truncated SVD on matrix with d=300 Best result is 0.11 with cosine similarity with the exponential decay function
Apply MDS with d=300, Best result is 0.35 with euclidian distance with the power-law decay dunction.
Calling this one closed, thanks!
YY wants to see what the embedding would be like if we learned an embedding that directly optimized the gravity law. That is, something like CMDS, except we try to learn an embedding that preserves gravity-like distances between locations.
Is this something that could be easily implemented and tested?