komoot / photon

an open source geocoder for openstreetmap data
Apache License 2.0
1.92k stars 281 forks source link

Version 0.2.0 with data problems? #134

Closed karussell closed 9 years ago

karussell commented 9 years ago

I'm testing if I can migrate to the new photon version. The old versions of photon are okay with 'berlin erlanger straße' but the current one seems to miss it (also tried unsuccessfully this with the very earliest dev version that you had online today for a few seconds ;))

christophlingg commented 9 years ago

This is the street in question: http://nominatim.openstreetmap.org/details.php?place_id=2639695215

I could not find a document with osm_id=317350507 in photon. @daryltucker could you check if your nominatim database contains this osm entity?

In general the document number increased by maybe 20% since the last import. It does not feel like we were missing a lot of docs...

christophlingg commented 9 years ago

I encountered another problem, Helsinki does not pop up at the top of the list:

http://photon.komoot.de/api?q=helsinki&debug

the helsinki relation has an importance of 0.19. very low being a capital. the nominatim instance of @lonvia shows 0.74

are we missing some data, like wikipedia matching?

karussell commented 9 years ago

Thanks for the instant feedback! I'll also try with some other requests

daryltucker commented 9 years ago

Hey,

I can check later, but if I'm not mistaken OSM ids are unique to their instance. ie: Even my and your OSM ids should be different.

I will make sure my speculations are true and take a look later!

On January 16, 2015 3:21:52 AM PST, Christoph Lingg notifications@github.com wrote:

This is the street in question: http://nominatim.openstreetmap.org/details.php?place_id=2639695215

I could not find a document with osm_id=17350507 in photon. @daryltucker could you check if your nominatim database contains this osm entity?

In general the document number increased by maybe 20% since the last import. It does not feel like we were missing a lot of docs...


Reply to this email directly or view it on GitHub: https://github.com/komoot/photon/issues/134#issuecomment-70240647

Daryl Tucker

karussell commented 9 years ago

The new version is really impressive!

Some minor data seems to be missing. I found two more entries that worked before 'bayreuth gravenreutherstrasse' and 'motor nützel bayreuth'. (see graphhopper maps where they still work for the autocomplete)

karussell commented 9 years ago

OSM ids are unique to their instance

normally OSM IDs are 'relative static', they should refer to the IDs of osm.org. They only change if you e.g. delete the object or for way IDs if you split the way.

christophlingg commented 9 years ago

osm ids are static, nominatim's place ids aren't

christophlingg commented 9 years ago

@daryltucker could you find the osm in your database? If not I could be also an import error, I can check this searching for the osm id in the json dump.

I am still wondering how the importance could deviate that much. @lonvia, could you have a possible explanation?

lonvia commented 9 years ago

The 'Erlanger Strasse' is missing city=Berlin because the OSM relation for Berlin is tagged in a very confusing way. I believe it should have place=city not place=state. There is a de:place=city but I never quite understood why there should be a distinction there. (It might be a good idea to come up with a script that corrects these kind of things (e.g. German Stadtstaaten and Kreisfreie Städte) in the Nominatim database to enforce the correct data because the tagging is still in flux and bound to change again and again.)

For 'Helsinki', there might have been a temporary matching problem with the wikipedia data in Daryl's database but it's hard to say without seeing the source DB. You should be able to easily check that by importing the Helsinki relation in your testdatabase.

christophlingg commented 9 years ago

The city assignments seems to work perfectly in Berlin, this is a park in Berlin and the corresponding photon document has a correct assignment. Nominatim does a good job here.

It is surprising that the street (erlanger straße) is missing in the index, I queried the elasticsearch index for the street by osm id and I could not find it. I also checked if it was dumped into the json file we used to import the data. It was not part of the dump neither. So only two reasons for the bug are left:

I am trying to check the later one and will come back to you in some minutes.

As of the importance value: If I understand it correctly, there are two optional additions to nominatim: the wikipedia dump and the uk postcodes. The wikipedia matching seems to have an important impact of the importance feature, but it is an optional feature. @daryltucker did you use it when you setup your nominatim server?

christophlingg commented 9 years ago

I imported the street in question in my test database. I created a json export and it included the street:

grep 317350507 dump.json 
{"osm_id":317350507,"osm_type":"W","osm_key":"highway","osm_value":"residential","importance":0.09999999999999998,"coordinate":{"lat":52.4818801,"lon":13.430818},"postcode":"12053","name":{"default":"Erlanger Straße"},"city":{"de":"Berlin","it":"Berlino","default":"Berlin","fr":"Berlin","en":"Berlin"},"country":{"it":"Germania","default":"Deutschland","fr":"Allemagne","en":"Germany"},"extent":{"type":"envelope","coordinates":[[13.4308016,52.4818854],[13.4314153,52.4816884]]}}

I presume @daryltucker nominatim database is missing some data and possibly is missing the wikipedia matching for the importance... Once we have your feedback, we can decide what are the next steps.

lonvia commented 9 years ago

I've found an entry for the Erlanger Str under OSM way 31906497 in your DB and it missing the city. For Berlin you will get a lot of correct entries because the address includes the place=city node. You can check on osm.org that http://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=31906497 does not get a city of Berlin assigned, while http://nominatim.openstreetmap.org/details.php?osmtype=W&osmid=317350507 still has the place node.

I don't quite remember where duplicate elimination was done. There might be something wrong happening there.

A further thought: it seems that you are not including the state in the search. Otherwise, I'd expect 'Erlanger Str, Berlin' to still find 31906497, because Berlin still is set as state.

For importance, you can simply check a few other entries, if any of them has an importance of more than 0.5 (Statue of Liberty for example). But from the way most results are ordered, I would guess that importance is mostly correctly included.

christophlingg commented 9 years ago

good catch, i just checked it and it's true that you cannot query the state attribute. I am working on a fix right now.

we have street deduplication at query time, so I guess the street (osm_id=317350507) must have been imported. it would really help to now if daryl has it in his db.

Well there is a where clause I do not really understand in the sql statement: WHERE linked_place_id IS NULL . Maybe this prevents one part of the street to be imported into photon?

lonvia commented 9 years ago

linked_place_id is only set for place nodes that have a admin boundary and in the latest Nominatim version for waterway ways that are part of a relation. Streets are normally not affected.

On closer inspection, I noticed that http://www.openstreetmap.org/way/317350507 has only been created a month ago. So I suspect that @daryltucker DB is simply a little bit behind.

christophlingg commented 9 years ago

we released 2.0.1 which fixed the bugs mentioned here