Closed jisungyoon closed 4 years ago
I love it! These groups make sense to me, at first glance, though I'd like to see it with more countries. Maybe it makes sense to lower the threshold to ~50 institutions?
I love it! These groups make sense to me, at first glance, though I'd like to see it with more countries. Maybe it makes sense to lower the threshold to ~50 institutions?
Ok, I will expand the number of countries with a threshold that you suggest. And, the upper figure is clustered with 'single linkage method', I will try another linkage method.
Threshold with 50
Korea-India-Egypt? a bit weird?
Korea-India-Egypt? a bit weird?
I agreed. This is a result with the Ward linkage method.
I will look more deeply next week.
I think it is a problem of a hierarchical clustering algorithm. this cosine distance list of Korea
[('Korea, Republic of', 0.0), ('United States', 0.32237375), ('India', 0.34673756), ('Japan', 0.39938033), ('China', 0.4347207), ('Thailand', 0.4453507), ('Turkey', 0.44664097), ('Egypt', 0.4478628), ('Taiwan, Province of China', 0.45242792), ('Iran, Islamic Republic of', 0.45678532), ('Germany', 0.46245217), ('Czech Republic', 0.47405958), ('Finland', 0.47886485), ('Poland', 0.47896522), ('Hungary', 0.47987974), ('Canada', 0.49002063), ('United Kingdom', 0.49369115), ('Australia', 0.50154287), ('Greece', 0.5104309), ('France', 0.5228225), ('Spain', 0.5243138), ('Norway', 0.5276358), ('Austria', 0.5495703), ('Denmark', 0.5518034), ('Russian Federation', 0.55325913), ('Ireland', 0.5662434), ('Israel', 0.5666102), ('Portugal', 0.5671894), ('Italy', 0.57064867), ('Mexico', 0.5711175), ('Brazil', 0.5753279), ('Netherlands', 0.5785767), ('Romania', 0.57941747), ('South Africa', 0.5849732), ('Sweden', 0.59312737), ('Belgium', 0.59552336), ('Switzerland', 0.5967138)]
This is the same list of china.
[('China', 0.0), ('United States', 0.3076265), ('Canada', 0.3742674), ('Australia', 0.39273417), ('Japan', 0.4018225), ('France', 0.4061414), ('Taiwan, Province of China', 0.40767324), ('United Kingdom', 0.41371167), ('Germany', 0.42083406), ('Korea, Republic of', 0.4347207), ('Thailand', 0.45633352), ('Belgium', 0.4671927), ('Norway', 0.47100806), ('Netherlands', 0.47470915), ('Italy', 0.47490293), ('Egypt', 0.47492677), ('Sweden', 0.47747213), ('Spain', 0.47868967), ('Austria', 0.48570627), ('Russian Federation', 0.48862463), ('Czech Republic', 0.500834), ('India', 0.5012982), ('Israel', 0.5022876), ('Brazil', 0.50393915), ('Denmark', 0.50441635), ('Finland', 0.50582707), ('Poland', 0.50899315), ('Iran, Islamic Republic of', 0.509736), ('Romania', 0.5151825), ('Hungary', 0.51786983), ('Switzerland', 0.5245992), ('Ireland', 0.5246782), ('Turkey', 0.52916855), ('South Africa', 0.5489218), ('Greece', 0.5578825), ('Portugal', 0.5614338), ('Mexico', 0.5638409)]
I think this problem comes from that so many countries are so closed to US. Is there any good method to get a robust cluster with a similarity or distance matrix? A doable method in my mind is a constructing a network with threshold, and do a community detection? Or find core-periphery structure? @yy
maybe just cluster without US?
And, interesting facts that there are 1,180 international trajectories of Indian and 686 trajectories contain Korea. Indian actually closes to Korea I think?
This is raw data that the most similar country of each country. Source_country (target_country, cosine_distance)
Egypt ('Canada', 0.36116463) Mexico ('Spain', 0.2994622) Ireland ('United Kingdom', 0.26680315) Thailand ('Japan', 0.34891373) South Africa ('Norway', 0.37570113) Denmark ('Norway', 0.279351) Hungary ('Romania', 0.35187584) Romania ('Hungary', 0.35187584) Israel ('United States', 0.33373773) Austria ('Germany', 0.20575869) Finland ('Sweden', 0.29471815) Greece ('United Kingdom', 0.30813986) Belgium ('Netherlands', 0.2875682) Portugal ('Spain', 0.32566804) Switzerland ('Germany', 0.26431) Czech Republic ('Poland', 0.37411135) Sweden ('Norway', 0.28661156) Taiwan, Province of China ('United States', 0.37452072) Iran, Islamic Republic of ('Canada', 0.3869642) Norway ('Denmark', 0.279351) Poland ('Germany', 0.36685592) Russian Federation ('Germany', 0.36757004) Australia ('United Kingdom', 0.29364806) Netherlands ('Belgium', 0.2875682) India ('Korea, Republic of', 0.34673756) Turkey ('Greece', 0.36510265) Korea, Republic of ('United States', 0.32237375) Canada ('United States', 0.30755192) Japan ('Thailand', 0.34891373) Brazil ('Portugal', 0.3536932) Italy ('United Kingdom', 0.3333258) Spain ('Mexico', 0.2994622) Germany ('Austria', 0.20575869) United Kingdom ('Ireland', 0.26680315) China ('United States', 0.3076265) France ('Belgium', 0.32789463) United States ('Canada', 0.30755192)
I think the cluster becomes more clear without the USA.
Fig update
Without USA
Without USA
Cluster become more clear without the USA I think
And, interesting facts that there are 1,180 international trajectories of Indian and 686 trajectories contain Korea. Indian actually closes to Korea I think?
Could you share how this was calculated? I'm getting some different numbers.
I am finding that there are 14,827 Indian researchers that have an affiliation in at least one other country.
Of these, 1,383, or about 9%
In contrast, about 35% of Indian international researchers have a US affiliation.
Looking into the data, a large proportion of the India <-> Korea flow seems to be from the major IIT and CSIR India to other major Korean universities, about what one would expect.
Dokota and I talked about the data issue yesterday and found that there are some errors in my dataset.
Here are new results:)
with USA
without USA
The result has been changed, but I think it sill makes sense
cool. I think it makes more sense now. it's super interesting to see how Israel is grouped with/without US.
Amazing—with the new data these better fit my priors. I also love the "commonwealth" group of the UK + S. Africa + AUS + Ireland
cool. I think it makes more sense now. it's super interesting to see how Israel is grouped with/without US.
You mean learn embedding without trajectories with USA?
And, another idea in my mind is the temporal movement of clusters with drawing Alluvial_diagram. https://en.wikipedia.org/wiki/Alluvial_diagram
No i was not suggesting anything.
I'd suggest thinking about what's the scope of the paper: i.e. what is the one-sentence summary of the paper? Are we talking about multiple papers or one? What's the message of the figure and how does it serve the message of the paper?
Yeah, I think we need to focus on establishing mobility itself. (for the paper?) I will check the mobility rule on discipline-split embedding. Or, testing radiation model on data.
Lets keep these results in mind for papers and presentations. But is the issue ready to close? @jisungyoon
I think it is ready to close.
I tried a hierarchical Clustering of countries. Note that this result comes from discipline-splitted embedding.
The procedure is summarized as follows.
I think it makes sense. Japan is the most isolated country, and Korea, China, US are in the same cluster. How do you think about this result? @yy @murrayds