neelkandlikar / water-sentiment

0 stars 0 forks source link

add geo data #9

Closed richpauloo closed 4 years ago

richpauloo commented 4 years ago

As discussed in the meeting with AA, each abstract should have a location (or NA) attached to it.

From the milestone (note that this is one approach and it need not be done this way as long as the end result is the same):

A script that appends locations to each abstract. Steps include: (1) obtain a list of countries from a stable, citable source like the World Bank, and (2) use the tdm from the previous milestone to join these to the locations by words in the tdm. Step (2) is a messy step that will require lots of iteration and manual-creation of alias lists for countries. You may need to manually filter for abstracts with an NA country and add country aliases until we're confident that we're catching nearly all countries (i.e., as you iterate and build an alias list, the abstracts with NA country will decrease). The end result is that each abstract is assigned a country or countries (e.g., this paper is China), or NA (it's a theoretical paper, like this one). This hard work opens the door to novel worldwide geographical analyses, and most definitely really cool global maps. Note that the location will not be provided as a field in the data, and that the location of the authors (their institution) is not the location of the study. The location must be mined from the title/abstract.

More refs: