Closed murrayds closed 4 years ago
Thinking about this some more and having some doubts. Imagine the following trajectory;
t1(A) -> t2(B) -> t3(A) -> t4(B)
There is no way of knowing whether the author left from A to B at t2 or at t4, or whether they just maintained two affiliations throughout, but just didn't publish much. Would we include two instances of traj(A, B) and traj(B, A) when counting flows?
Another one:
t1(A, B, C, D) -> t2(A)
Here, would we include trajectories traj(B, A), traj(C, A), traj(D, A), even though A was in t1?
Another edge case:
t1(A, B) -> t2(A) -> t3(B)...
Under a directional model, would we include traj(A, B) even though the appeared in the same (first) time period?
I am uncomfortable with the number of choices we might have to make in order to impose directionality on the data. Maybe the simple co-occurrence model is sufficient? Or perhaps we could modify the model to make more sense given the nature of the data? Or do you all have ideas for adding directionality in a way that is theoretically sound?
@yy @jisungyoon, interested in your thoughts on this
I also think about the issue. It is a very tricky problem in terms of
Also, I asked my friend about the gravity model. In general, people use directional flow. This problem makes our problem difficult to apply the original gravity model directly and I realized that the way that suggested before (fractional number) is not good methods after playing around with data. Because word2vec does not work like that
I think co-occurrence is good enough, but we need to specify how we define a flow between two institutes in the paper.
Is it crucial to think about directionality? Given that we currently don't have a good idea/understanding of the directional embedding, I don't think it's worthwhile (atm) to digging into this. The simplest approach is just setting the time window in terms of # of papers, and then just use all pairs of affiliation to run the skip-gram model I think?
Is it crucial to think about directionality? Given that we currently don't have a good idea/understanding of the directional embedding, I don't think it's worthwhile (atm) to digging into this. The simplest approach is just setting the time window in terms of # of papers, and then just use all pairs of affiliation to run the skip-gram model I think?
Do you mean that set window size of maximum length of sentences?
Why I tried to define a direction was a hypothesis of another mobility model, the radiation model is asymmetric of flow. Sorry Let's focus on the gravity model for now.
Is it crucial to think about directionality? Given that we currently don't have a good idea/understanding of the directional embedding, I don't think it's worthwhile (atm) to digging into this. The simplest approach is just setting the time window in terms of # of papers, and then just use all pairs of affiliation to run the skip-gram model I think?
Do you mean that set window size of maximum length of sentences?
Nope.
Why I tried to define a direction was a hypothesis of another mobility model, the radiation model is asymmetric of flow. Sorry Let's focus on the gravity model for now.
But even if you create directional trajectories, it's totally unclear how you get to the asymmetric distance?
Why I tried to define a direction was a hypothesis of another mobility model, the radiation model is asymmetric of flow. Sorry Let's focus on the gravity model for now.
But even if you create directional trajectories, it's totally unclear how you get to the asymmetric distance?
In the radiation model, we don't need an asymmetric distance. Asymmetric flows come from topology of the neighborhood.
Why I tried to define a direction was a hypothesis of another mobility model, the radiation model is asymmetric of flow. Sorry Let's focus on the gravity model for now.
But even if you create directional trajectories, it's totally unclear how you get to the asymmetric distance?
I think in this case, we are talking only about calculating flows between organizations, i.e., Fij / P1 * P2. Calculating embedding distance with symmetric trajectories will remain the same. However, we were concerned because the traditional gravity and radition models assume a directional flow (i.e., a person moves -to- Boston -from- Bloomingtom).
I think focusing on co-occurence is fine for the gravity model. While the data doesn't fit the standard format for the model, it still tells us something about co-affiliations within a time period.
I am not sure how this would impact the radiation model as I haven't spent as much time looking into it, so Jisun is right that we should focus on the gravity model for the time being.
This direction is no longer being pursued, so the issue will be closed
For each pair of consecutive mobility events, define all combinations of time period one (t1) to time period two (t2). In a full counting scheme, these can be weighted the same. In a fractional scheme, they can be weighted by the share of total combinations.
For example,
t1(A) -> t2(B) t1(A) -> t2(A, B, C) gives
t1(A, B, C) -> t2(D) gives
t1(A, B) -> t2(B, C, D)
These will be used to calculate organization flows for the gravity model, which will be compared against distances in the embedding space