somnathrakshit / geograpy3

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
https://geograpy3.readthedocs.io
Apache License 2.0
124 stars 12 forks source link

Signapore Michigan in CityLookup but Singapore, Singapore is not ... #55

Closed WolfgangFahl closed 1 year ago

WolfgangFahl commented 3 years ago

Singapore Michigan is a ghost town and former settlement - these might rather be ignored

WolfgangFahl commented 3 years ago
select  cl.wikidataid,label,name,pop,regionIso,regionName,countryIso,countryName,gndId,geoNameId,lat,lon from 
city_labels l 
join CityLookup cl on l.wikidataid=cl.wikidataid
where l.label in ('St. Petersburg','Singapore')
wikidataid label name pop regionIso regionName countryIso countryName gndId geoNameId lat lon
Q656 St. Petersburg Saint Petersburg 5384342 RU-SPE Saint Petersburg RU Russia 4267026-3 498817 60 30
Q49236 St. Petersburg St. Petersburg 253693 US-FL Florida US United States of America 4251544-0 4171563 28 -83
Q7522845 Singapore Singapore US-MI Michigan US United States of America 43 -86
Q1187352 St. Petersburg St. Petersburg US-PA Pennsylvania US United States of America 5210484 41 -80
Q1187352 St. Petersburg St. Petersburg US-PA Pennsylvania US United States of America 5210484 41 -80
Q334 Singapore Singapore 5888926 SG Singapore SG Singapore 4055089-8 1880251 104 1
WolfgangFahl commented 3 years ago

https://www.wikidata.org/wiki/Q334 is a city state https://www.wikidata.org/wiki/Q133442 so we might need to add a query for the city states as a separate "region" Json file

tholzheim commented 3 years ago

The city-states are now included in the RegionManager. By querying the Cities by region as in https://github.com/somnathrakshit/geograpy3/blob/62856f8a2946b9bde1c91409f54cf3bb76684c40/tests/testCachingCitiesByRegion.py#L73 the city-states are than also added as cities (additionally with the municipalities of the city)

SPARQL Query for all city-states

# get a list of city states
# for geograpy3 library
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT DISTINCT ?countryId (?cityStateQ as ?wikidataid) ?name ?iso ?pop ?coord
WHERE
{
  # all citiy states
  ?cityStateQ wdt:P31 wd:Q133442 .
  ?cityStateQ rdfs:label ?name filter (lang(?name) = "en").
  { 
    SELECT ?cityStateQ (max(?isoCode) as ?iso) (max(?populationValue) as ?pop) (max(?locationValue) as ?coord)
    WHERE {
      ?cityStateQ wdt:P300|wdt:P297 ?isoCode.
      # get the population
      # https://www.wikidata.org/wiki/Property:P1082
      OPTIONAL {
        ?cityStateQ wdt:P1082 ?populationValue
      } 
      # get the location
      # https://www.wikidata.org/wiki/Property:P625
      OPTIONAL {
        ?cityStateQ wdt:P625 ?locationValue. 
       }
    } GROUP BY ?cityStateQ
  }
  OPTIONAL { 
    ?cityStateQ wdt:P17 ?countryId.
  }
} ORDER BY ?iso

Try it! Used in https://github.com/somnathrakshit/geograpy3/blob/ba60d6e4b37e665205b2bc312085f0372053c620/geograpy/wikidata.py#L299

WolfgangFahl commented 1 year ago

see #70