Closed parrotcar00 closed 4 years ago
If Mordecai's not detecting the place name at all, that's an issue with the named entity recognition model it's using, specifically spaCy. spaCy's NER is trained on a set of text that doesn't have great geographic coverage (it often misses place names in the Middle East) as well. It would be possible to label text with more place names and train a better model, but I'm afraid I won't have the time to do that in the foreseeable future.
Hi, I just tried to do this:
geo.geoparse("Beautiful Daedunsan, South Korea")
and I got:
[{'word': 'South Korea', 'spans': [{'start': 21, 'end': 32}], 'country_predicted': 'KOR', 'country_conf': 0.9998105, 'geo': {'admin1': 'NA', 'lat': '36.5', 'lon': '127.75', 'country_code3': 'KOR', 'geonameid': '1835841', 'place_name': 'Republic of Korea', 'feature_class': 'A', 'feature_code': 'PCLI'}}]
So looks like mordecai is not recognizing "Daendunsan" which is a mountain in South Korea. I then looked up Daendunsan in http://www.geonames.org/ which is the default gazetteer mordecai is using (as learnt from the README) and geonames.org is returning the right search result for Daendunsan.
Is this a bug or do I need to download and use a newer version of the gazetteer from somewhere?