openeventdata / mordecai

Full text geoparsing as a Python library
MIT License
742 stars 97 forks source link

Issues with the geoparse prediction for China and U.S #91

Open akankshanb opened 3 years ago

akankshanb commented 3 years ago
  1. The geoparse seems to allocate building as place when China is used in a sentence geo.geoparse('We traveled to China')
[{'country_conf': 0.68758196,
  'country_predicted': 'CHN',
  'geo': {'admin1': 'Hubei',
          'country_code3': 'CHN',
          'feature_class': 'S',
          'feature_code': 'SCHC',
          'geonameid': '6620465',
          'lat': '30.52047',
          'lon': '114.39637',
          'place_name': 'China University of Geosciences'},
  'spans': [{'end': 20, 'start': 15}],
  'word': 'China'}]
  1. It predicts Canada as a country if U.S in the sentence is used geo.geoparse('We traveled to the U.S')
[{'country_conf': 0.28868943,
  'country_predicted': 'CAN',
  'spans': [{'end': 22, 'start': 19}],
  'word': 'U.S'}]

Kindly look in this. Thanks!

ahalterman commented 3 years ago

Thanks for the report. Mordecai was mostly built to geolocate subnational locations, but several people have requested this feature so I'm planning to add it in the next re-write.