murrayds / sci-mobility-emb

Embedding of scientific mobility across institutions, cities, regions, and countries
4 stars 0 forks source link

Validate geographic coordinates #18

Closed murrayds closed 4 years ago

murrayds commented 4 years ago

After looking through the institution lookup file, not all geographic coordinates should be trusted. Some are flipped (i.e, longitude appearing in the latitude slot), and some just don't make sense at all.

We should validate these coordinates, and fix what we can. One way to do this efficiently/systematically would to group institutions by country, and using geographic distance, identify anomalous coordinates and follow up on them. After a quick scan of the data, this should be only the order of only about 100-300.

murrayds commented 4 years ago

In reference to the discussion in #13. Possible solutions listed at the bottom of this issue.

you may want to check out the points at an extremely small distance. Under -3, you're talking about literally the scale of meters.

Some geographic distances are also incredibly small, as in 2.22E-04km.

Sometimes, these seem to be centers or hospitals that are part of the university and that have roughly similar coordinates

1917 | Oxford University Hospitals NHS | 51.763873 | -1.219806 1921 | John Radcliffe Hospital | 51.763871 | -1.219806 111 | Technical University of Denmark | LYNGBY | 55.7856 | 12.5214 30807 | National Institute of Aquatic Resources | LYNGBY | 55.785574 | 12.521381

Options:

@yy thoughts?

yy commented 4 years ago

Let's impute with 1k.

murrayds commented 4 years ago

The coordinates have been cleaned, with missing coordinates imputed and incorrect coordinates fixed. Information on how this was accomplished can be found in the wiki at https://github.com/murrayds/sci-mobility-emb/wiki/Geographic-coordaintes-and-distances

yy commented 4 years ago

Awesome! You may want to put this into the method section of the paper draft. Also, it may be worth exploring whether this work can be published.