openeventdata / mordecai

Full text geoparsing as a Python library
MIT License
742 stars 97 forks source link

Use it with a custom made spacy model and without country model #74

Closed miguelwon closed 4 years ago

miguelwon commented 4 years ago

How can I use it for a very specific problem, where I have my spacy model, I don't need the country model because I know all mentions are from one country and I have also my own index of geonames?

ahalterman commented 4 years ago

If you have a bunch of extracted place names and you know which country they're from, you can use Mordecai code that interfaces with geonames to look things up. You should be able to run your custom Geonames index instead of the pre-built one, as long as everything is in the same format. Here's some code I've used when I wanted the coordinates of place names that I knew were cities in Syria:

def lookup_city(city, iso3c="SYR"):
    """
    Return the "best" Geonames entry for a city name.

    Queries the ES-Geonames gazetteer for the the given city and Syria,
    and uses a set of  rules to determine the best result to return. More 
    accurate/precise feature codes are preferred.

    This code was taken from Halterman's (2019) Syria casualties working paper and
    designed for geolocating Shuhada casualty data.

    Parameters
    ----------
    placename: str
      The name of the city to look up
    iso3c: str
      The three character country code

    Returns
    -------
    match: dict or list
      The single entry from Geonames that best matches the query, or [] if no match at all.
    """
    res = geo.query_geonames_country(city, iso3c)
    res = res['hits']['hits']
    # look for a neighborhood in the province
    match = [i for i in res if i['feature_code'] in ['PPL', 'PPLA', 'PPLC', 'PPLA2', 'PPLA3', 'PPLA3']]
    if match:
        if len(match) == 1:
            return match[0]
        else:
            m = check_exact(city, match)
            return m
    else:
        match = [i for i in res if i['feature_code'] in ['PPLX', 'LCTY', 'PPLL', 'AREA']]
        if match:
            m = check_exact(city, match)
            return m
        else:
            return None