somnathrakshit / geograpy3

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
https://geograpy3.readthedocs.io
Apache License 2.0
124 stars 12 forks source link

Score for each location? #21

Closed 0AlphaZero0 closed 3 years ago

0AlphaZero0 commented 4 years ago

Could it be possible to have a score for each location found? Indeed sometimes it could be good to know why some locations are in the results for example : `

import geograpy places = geograpy.get_geoPlace_context(text="This sentence mention UK as country and London as city.") places.countries ['United Kingdom', 'United States', 'Canada'] places.cities ['London'] places = geograpy.get_geoPlace_context(text="Jin Yin-tan Hospital, Wuhan, China.") places.countries ['China', 'Mexico', 'United States'] places.cities ['China'] `

Something like the following score could be interesting : [('United Kingdom',0.99), ('United States',0.56), ('Canada',0.45)]

A score of confidence could help to avoid those results.

WolfgangFahl commented 4 years ago

How would you like to calculate the score? Currently there are a few possible strategies:

Please note that the disambiguation is currently only possible with the Locator API.

0AlphaZero0 commented 4 years ago

I think a combination of fields should be the best approach.

WolfgangFahl commented 4 years ago

see also http://wiki.bitplan.com/index.php/Geograpy#Difference_in_Name.2FLabel

WolfgangFahl commented 3 years ago

52 now addresses this - the default is to order by population. For our own use case we'll use a more sophisticated version and see what the likelyhood of a location is in our context given how often it is already in our corpus.