murrayds / sci-mobility-emb

Embedding of scientific mobility across institutions, cities, regions, and countries
4 stars 0 forks source link

Visualization embeddings #34

Closed jisungyoon closed 4 years ago

jisungyoon commented 4 years ago

This issue is related to Fig 3. visualization of embedding(maybe). I tried the figure which is colored by continent, but I think this figure does not give an enough information.

test

What's your thought about this figure? @yy @murrayds

yy commented 4 years ago

I’d annotate country names. Continent names can go as a legend with the color key. Can the map rotated in any way to match the usual map we see?

jisungyoon commented 4 years ago

I’d annotate country names. Continent names can go as a legend with the color key.

Okay, that's a good idea

Can the map rotated in any way to match the usual map we see?

I can't catch your meaning. Do you mean that make it similar to real geographical map? Rotating the axis?

murrayds commented 4 years ago

It looks so much better!

Perhaps we can use a more "striking" color for South America and Africa so that they are easier to locate? Similarly, since the U.S. is mostly contained with a single cluster, we can give them a more boring color (maybe the grey that is currently being used for South America?)

I can't catch your meaning. Do you mean that make it similar to real geographical map? Rotating the axis?

That's my understanding. There seems to be a somewhat natural "Europe" - "Asia" axis, so maybe orient the graph such that the Asia clusters are on one side, and the Europe clusters on the other?

jisungyoon commented 4 years ago

test (1) Like this way? I rotate the axis

murrayds commented 4 years ago

Like this way? I rotate the axis

This seems better to me! and I like the colors more too, they makes it easier to see the Spain/Portugal - South America connection

yy commented 4 years ago

I think if you mirror-flip wrt x-axis, you get a similar arrangement to the actual globe (viewed from north pole).

image
jisungyoon commented 4 years ago

Ok, Awesome! I will try.

jisungyoon commented 4 years ago

download

jisungyoon commented 4 years ago

I did a mirror flip and results is an upper figure I'm adding annotations on countries that have more than 100 institutes right now, how many countries do I need to annotate?

yy commented 4 years ago

As many as possible. As long as it's not too messy.

jisungyoon commented 4 years ago

test Any comment?

yy commented 4 years ago

portugal and other south american countries? Québec? I think we can still put way more. Whatever we can provide stories.

yy commented 4 years ago

Also, again, set the dimension of the figure first. That gives you the minimum font size that you can use. We probably want to use that minimum size. For instance, if we want to have at least one full-width figure (in supp or something), then we can go wtih a lot of details. Feel free to use abbreviated names or even flags

jisungyoon commented 4 years ago

portugal and other south american countries? Québec? I think we can still put way more. Whatever we can provide stories.

In this embedding, Quebec is not closed to France. Below chunk of canada is Quebec

yy commented 4 years ago

What's that European country right below Canada?

jisungyoon commented 4 years ago

Also, again, set the dimension of the figure first. That gives you the minimum font size that you can use. We probably want to use that minimum size. For instance, if we want to have at least one full-width figure (in supp or something), then we can go wtih a lot of details. Feel free to use abbreviated names or even flags

We think this figure as similar size at labor flow network paper Screen Shot 2019-12-06 at 2 13 27 PM

jisungyoon commented 4 years ago

What's that European country right below Canada?

Iran, Turkey

murrayds commented 4 years ago

We can also relabel some of the countries to save space. For example, "United States" -> "U.S.A.", "United Kingdom" -> "U.K.", "Russian Federation" -> "Russia", and "Korea, Republic of" -> "S. Korea"

jisungyoon commented 4 years ago

We can also relabel some of the countries to save space. For example, "United States" -> "U.S.A.", "United Kingdom" -> "U.K.", "Russian Federation" -> "Russia", and "Korea, Republic of" -> "S. Korea"

Ok, I will try that.

jisungyoon commented 4 years ago

And, I am planning to draw the enlarged version of the upper figure (like labor flow figure). Here are the candidates.

  1. Enlarge USA only, and colored by region, and also enlarge Massachusetts state only, colored by org types
  2. Enlarge language-related part (Brazil, Spain, Chile, Colombia, Portugal)
  3. Enlarge surroundings of Switzerland (France, Germany, Italy...)

I think we can pick 2 of them.

yy commented 4 years ago

1 & 2

jisungyoon commented 4 years ago

test (1)

update figure.

jisungyoon commented 4 years ago

I tried two region categories for u.s.a. figures. which one ie better? @yy @murrayds census Based on Cesus economic Based on economic

jisungyoon commented 4 years ago

please Ignore the color and positions of legend, I will fix it after determining standards for coloring. This is a figure that colored by states for reference test

murrayds commented 4 years ago

Looking at the now, I think I like the Census colors (# 1) better, assuming that there will be state- or city-level labels for some key clusters. I like the more granular economic clusters (# 2), but they might not be as meaningful to non-US readers.

jisungyoon commented 4 years ago

census (2)

Any comment? @yy @murrayds

jisungyoon commented 4 years ago

An interesting point about state-level embedding is the institution of D.C. are very broadly distributed. And, It seems likes to connect relatively weak area such as Hawaii or Alaska?

murrayds commented 4 years ago

Great! Is there a coherent cluster for Florida? It's a big population center so it can also be labeled.

An interesting point about state-level embedding is the institution of D.C. are very broadly distributed. And, It seems likes to connect relatively weak area such as Hawaii or Alaska?

Are they primarily government/military organizations? For example, It's likely that researchers for NASA will publish with the agency affiliation, which links to D.C., even if they work elsewhere in the country

jisungyoon commented 4 years ago

Are they primarily government/military organizations? For example, It's likely that researchers for NASA will publish with the agency affiliation, which links to D.C., even if they work elsewhere in the country

If we enlarge only D.C. entities, there are storing but small communities which are composed of universities (George Washington, Howard univ), and other inst looks like distributed broadly.

Screen Shot 2019-12-08 at 1 14 39 PM
jisungyoon commented 4 years ago

Is there a coherent cluster for Florida? It's a big population center so it can also be labeled

Florida has two clusters, but the main cluster is located in the center. I will label Florida for the main cluster.

jisungyoon commented 4 years ago

I'm not familiar with states in the USA, can you make a list of states that I didn't include on the upper figure, but you are interested in? @murrayds

jisungyoon commented 4 years ago

census (3) I added labels as many as possible. Any comment?

jisungyoon commented 4 years ago

It is almost a top 22 states based on population

yy commented 4 years ago

try to avoid using the lines. There is no one-to-one correspondence between the state and the vectors anyway. try to put the label where the points are located. A typo for Florida btw. What're the points at the bottom?

jisungyoon commented 4 years ago

try to avoid using the lines. There is no one-to-one correspondence between the state and the vectors anyway. try to put the label where the points are located. A typo for Florida btw.

Is it okay to decrease the font size? I tried to put the label where the points are located, but it is a little bit big for a given space.

What're the points at the bottom?

Another Flordia. The University of Miami, and hospital, health research inst of The University of Miami,

yy commented 4 years ago

It depends on the figure size. https://www.nature.com/documents/nature-final-artwork.pdf nature's minimum font size is actually pretty small (5pt), although we probably want to keep larger than that.

I'd put Florida twice. (also the leftmost group?)

jisungyoon commented 4 years ago

you mean leftmost of Northeast group? That's university in Pittsburgh, ex) CMU, Univ of Pittsburgh,

institutions of Pennsylvaniy is divided into two clear clusters, non-Pittsburgh (Most of them are located in Philadelphia) and Pittsburgh

jisungyoon commented 4 years ago

Screen Shot 2019-12-09 at 1 54 49 PM

This is a test plot for Massachusetts, Most of the institutes which are located on the lower-right part are Institutes in Cambridge, Boston. Ant upper-left parts are institutes in Worcester or public school of Massachusetts.

murrayds commented 4 years ago

I think that, with Florida, most of the major states in terms of university systems, are labeled.

The zoom-in on Boston looks pretty good, we can also compare it to New York state to see which is most interesting.

Its current hard to tell what is an isntitute and what is a univeristy. Maybe make the colors for institute and university more distinct? Or make the points bigger?

jisungyoon commented 4 years ago

The zoom-in on Boston looks pretty good, we can also compare it to New York state to see which is most interesting.

Do you mean in the next papers?

Its current hard to tell what is an isntitute and what is a univeristy. Maybe make the colors for institute and university more distinct? Or make the points bigger?

Yeah, I will re-arrange the color, and also resize the points

jisungyoon commented 4 years ago

MS Update

murrayds commented 4 years ago

The zoom-in on Boston looks pretty good, we can also compare it to New York state to see which is most interesting.

Do you mean in the next papers?

Yeah, after thinking about it, Boston is a natural choice for this paper. We can potentially examine others in future papers.

I like the new figure! What does the size of points mean? And maybe label one or tow more hospitals?

jisungyoon commented 4 years ago

What does the size of points mean?

It is inst_size but logged.

And maybe label one or tow more hospitals?

Do you know the famous hospitals in Massachusetts? and also research institutes

murrayds commented 4 years ago

Ah, that makes sense.

Perhaps "Massachusetts General Hospital" and "Boston Medical Center"?

It would also be good to label 'Tufts University" and "Brandeis University"—both well-known schools but not entirely in the same sphere as MIT and Harvard

jisungyoon commented 4 years ago

MS (1) There is too many Bostons, so I added Umass health care

murrayds commented 4 years ago

There is too many Bostons, so I added Umass health care

Lol, fair enough, I think its looking pretty good!

jisungyoon commented 4 years ago

census (4) Any comment?

murrayds commented 4 years ago

Looks good! Just a couple of spelling fixes:

"Flordia" -> "Florida" "Virgina" -> "Virginia" "Coloardo" -> "Colorado" "New Maxico" -> "New Mexico"

jisungyoon commented 4 years ago

Looks good! Just a couple of spelling fixes:

"Flordia" -> "Florida" "Virgina" -> "Virginia" "Coloardo" -> "Colorado" "New Maxico" -> "New Mexico"

Sorry, I will fix it.

jisungyoon commented 4 years ago

And, another thing is I used window-size=2, embedding_dim=200 for this visualization Do we need to change the embedding? Because I saw you changed the dim with 128, 256 recently @murrayds