somnathrakshit / geograpy3

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
https://geograpy3.readthedocs.io
Apache License 2.0
124 stars 12 forks source link

Brazil is not recognized as a country #34

Closed drdarina closed 3 years ago

drdarina commented 3 years ago

Describe the bug

I'm not sure if I'm not using geograpy correctly or if this is an issue. Brazil is not extracted at all using get_place_context with an expression and considered to be an US region when passed on its own.

Additionally, Aorus, which is definitely not a place name, is recoginzed as a geoEntity when using the extractor - instead of Brazil.

>>> geograpy.get_place_context(text='Aorus league 2021 Brazil').countries
[]

>>> geograpy.get_place_context(text='Aorus league 2021 Brazil').other
['Aorus']

>>> e = Extractor(text='Aorus League 2021 #1 Brazil')
>>> e.find_geoEntities()
['Aorus']

>>> e = Extractor(text='Brazil')
>>> e.find_geoEntities()
['Brazil']

>>> geograpy.get_place_context(text='Brazil').countries
['Brazil', 'United States']

>>> geograpy.get_place_context(text='Brazil').regions
['Brazil']

Environment (please complete the following information):

WolfgangFahl commented 3 years ago

@drdarina Thank you for reporting this. The Extractor is not using the latest information provided by geograpy3. You might want to use the locator functionality instead and be aware that there is of course a need for disambiguation for place names. E.g. "London/Ontario" is different then "London/UK". Thats why we provide disambiguation helping information e.g. by adding the population info. We have the same problem for our own usecases in scientific event disambiguation so you might want to stay tuned. The Aorus match is probably an issue with the underlying library we are using.