murrayds / sci-mobility-emb

Embedding of scientific mobility across institutions, cities, regions, and countries
4 stars 0 forks source link

Attempt instituional splitting by discipline #10

Closed murrayds closed 4 years ago

murrayds commented 4 years ago

A la persona2vec.

jisungyoon commented 4 years ago

I try node2vec with splitting discipline, but the effect of discipline is very low I think.

jisungyoon commented 4 years ago

Colored_by_field This figure is colored by field, and Colored_by_country (1) This figure is colored by coutry.

jisungyoon commented 4 years ago

But, on a national scale, there is some pattern. This is the case in the USA. usa

jisungyoon commented 4 years ago

I think there are two possibilities that embedding is not clustered at the international level

  1. Disciplines are so broad to capture the difference In this case, we can divide persona(discipline) based on co-occurrence analysis

  2. There might be an imbalance between domestic trajectories and international trajectories in mobility data. (in terms of amount)

murrayds commented 4 years ago

Amazing! Thanks for working this up!

I think there are two possibilities that embedding is not clustered at the international level

1. Disciplines are so broad to capture the difference
   In this case, we can divide persona(discipline) based on co-occurrence analysis

2. There might be an imbalance between domestic trajectories and international trajectories in mobility data. (in terms of amount)

It's probably a mix of both. I can try to get more fine-grained disciplinary classifications, but they become less interpretable when we get more granular.

However, I think that point 2 is the major factor. International mobility is rare compared to the total number of scholars (I will calculate this as a percentage of our data). I think that we can still make use of persona2vec though, possibly by:

  1. Focusing on individual countries as you did for the U.S.A
  2. Restricting the data to only cases of international mobility, rather than institutional mobility. After filtering to internationally-mobile instances, we can embed using the institution or the country. This might be cool, but could also be difficult to interpret.

Lets think on it and maybe bring it up during team meeting?

jisungyoon commented 4 years ago

Lets think on it and maybe bring it up during team meeting?

You mean tomorrow?

murrayds commented 4 years ago

Lets think on it and maybe bring it up during team meeting?

You mean tomorrow?

Sure, maybe we can meet after the SG meeting and you can run me through what these results mean again? And we can discuss next steps.

jisungyoon commented 4 years ago

I try to use city_to_region for the USA data set, but there are nan values in the data set. Do you want to fill this out manually? @murrayds

murrayds commented 4 years ago

I try to use city_to_region for the USA data set, but there are nan values in the data set. Do you want to fill this out manually? @murrayds

I will fill out some missing city names and update you when I am done. I am also working on getting regional-scale data now for every organization, even outside of the United States.

jisungyoon commented 4 years ago

I try to apply gravity rule on disciple split embedding. In 2008-2019_nonmobile, there is no information about the fields of the researcher. Can you give me the fields of non_mobile researchers? @murrayds

jisungyoon commented 4 years ago

practice (1)

Here are results from split embedding.

murrayds commented 4 years ago

I try to apply gravity rule on disciple split embedding. In 2008-2019_nonmobile, there is no information about the fields of the researcher. Can you give me the fields of non_mobile researchers? @murrayds

This data can be found in the file SME-dropbox/Data/Raw/nonmobile_researcher_trajectories.txt

Meanwhile, the trajectories for mobile researchers are now stored in SME-dropbox/Data/Raw/mobile_researcher_trajectories.txt

Here are results from split embedding.

Nice! Irs good to know that the performance persists in the discipline split data. I'm making a presentation for Wednesday and I'll throw this in.

jisungyoon commented 4 years ago

Thanks a lot. I use M_i terms as a fraction of the original institutes size, I will change to this value. Thank you!