pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
112 stars 72 forks source link

remove public_transport from features list #556

Open missinglink opened 3 years ago

missinglink commented 3 years ago

When searching a Berlin street, Pelias returned a bus stop with the same name:

https://pelias.github.io/compare/#/v1/autocomplete?text=gleimstrasse https://www.openstreetmap.org/relation/6378676

Screenshot 2021-07-22 at 16 42 54

When I dug a little deeper I realised that this is actually quite a common issue, many of these public_transport tags are named literally the same as a nearby street or the containing neighbourhood.

I checked the nominatim and osm2pgsql source to see how they handle it, and... they don't, from what I can tell they don't target the tag public_transport for the gazetteer style export from osm2pgsql at all 🤔

This PR removes public_transport:* from our builds, the diff of which returned a bunch of results like I mentioned above, where the name was easily confused with a nearby street, neighbourhood.

It seems counterintuitive to remove public_transport however many of the things we were intending to target with that are actually being covered by other more descriptive tags such as railway~station+name.

I believe this will be very beneficial for users as there will be less duplication and confusion with similarly named entities in the results.

It's possible that we end up removing some entries which some users would like, although I suspect this will be rare, if at all, and we can always add more specific tags again in the future as required.

note: as part of this work I rewrote some of the end-to-end test code since the diff output was basically impossible to read.

missinglink commented 3 years ago

I'm actually considering dropping highway:bus_stop+name too, it produces weird labels, like this one from nominatim:

Screenshot 2021-07-22 at 17 15 56
missinglink commented 3 years ago

removed highway:bus_stop and railway:tram_stop, these smaller local stops generally don't have a unique name.

orangejulius commented 3 years ago

Hm, I am torn on this one. On the one hand, I can absolutely see how this can in many cases improve results, especially for "plain" queries without any filtering on layers.

But I also think there's a good chance this will impact some queries. Lots of Pelias users are particularly interested in transit stops, so ensuring we don't break anything is worth some thought.

Here are some scenarios to think about

All that said, I 100% support the goal of cleaner query results in cases like this, so investigating solutions to the problem are worth it.

missinglink commented 3 years ago

Based on the changes in fixture diff in this PR it appears to be positive, although I can understand some hesitation.

I've generated a much larger diff for the whole of Berlin (1MB) which shows all the records which would be removed as part of this feature.

This was generated by running the end-to-end tests twice, once with the original config and once with the config proposed in this PR, I then diff'd it with the following command to produce a summary of only the lines which were deleted:

comm -2 -3 <(jq -cM -s 'sort_by(._id) | .[]' berlin-latest.osm.pbf.jsonl) <(jq -cM -s 'sort_by(._id) | .[]' berlin-latest.osm.pbf.no_public_transport.jsonl) > berlin-latest.osm.pbf.diff.jsonl

cat berlin-latest.osm.pbf.diff.jsonl | jq -r '(._id + " " + .data.name.default)'

I chose a popular transit stop Potsdamer Platz as a test case since it has a variety of public transport options, a couple streets with the same name and it's also the name of the surrounding neighbourhood:

grep -i 'potsdamer' berlin.osm.public_transport.diff.jsonl | grep -i 'platz'

openstreetmap:venue:node/1874617251 S Potsdamer Platz/Voßstraße
openstreetmap:venue:node/1877576992 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:node/1877576997 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:node/1877848269 S+U Potsdamer Platz
openstreetmap:venue:node/1877848274 S+U Potsdamer Platz
openstreetmap:venue:node/1877848279 S+U Potsdamer Platz
openstreetmap:venue:node/1877848283 S+U Potsdamer Platz
openstreetmap:venue:node/2093326467 S Potsdamer Platz
openstreetmap:venue:node/2098466340 U Potsdamer Platz
openstreetmap:venue:node/2451831547 S Potsdamer Platz
openstreetmap:venue:node/2456043650 Berlin Potsdamer Platz
openstreetmap:venue:node/2456043651 Berlin Potsdamer Platz
openstreetmap:venue:node/2456043652 Berlin Potsdamer Platz
openstreetmap:venue:node/27424921 Berlin Potsdamer Platz
openstreetmap:venue:node/29495049 U Potsdamer Platz
openstreetmap:venue:node/298516304 S Potsdamer Platz/Voßstraße
openstreetmap:venue:node/30353079 S Potsdamer Platz
openstreetmap:venue:node/322796203 S Potsdamer Platz
openstreetmap:venue:relation/2364886 Berlin, S Potsdamer Platz/Voßstraße
openstreetmap:venue:relation/2368542 Berlin, Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:relation/3198309 S Potsdamer Platz
openstreetmap:venue:relation/3200537 S Potsdamer Platz
openstreetmap:venue:relation/5686702 Berlin Potsdamer Platz, Bahnhof
openstreetmap:venue:relation/5686709 Berlin, S+U Potsdamer Platz [Bus Stresemannstraße]
openstreetmap:venue:relation/6865726 U Potsdamer Platz
openstreetmap:venue:relation/8435158 Berlin, S+U Potsdamer Platz [Bus Leipziger Straße]
openstreetmap:venue:way/237532199 Berlin Potsdamer Platz
openstreetmap:venue:way/237532200 Berlin Potsdamer Platz
openstreetmap:venue:way/382265010 S Potsdamer Platz/Voßstraße
openstreetmap:venue:way/382265011 S Potsdamer Platz/Voßstraße
openstreetmap:venue:way/382265012 S+U Potsdamer Platz
openstreetmap:venue:way/382265013 S+U Potsdamer Platz
openstreetmap:venue:way/382265014 S+U Potsdamer Platz
openstreetmap:venue:way/382265015 S+U Potsdamer Platz
openstreetmap:venue:way/382265023 Varian-Fry-Straße/Potsdamer Platz
openstreetmap:venue:way/382265024 Varian-Fry-Straße/Potsdamer Platz

The most popular public transit option is "U Potsdamer Platz" (the underground) and we can see that 3x records with U Potsdamer Platz are removed and 10x in some form of "S+U Potsdamer Platz".

So a couple things worth mentioning here:

And looking at what didn't get removed:

cat berlin-latest.osm.pbf.no_public_transport.jsonl | jq -r '(._id + " " + .data.name.default)' | grep -i 'potsdamer' | grep -i 'platz' | grep 'venue'

openstreetmap:venue:node/84644779 Suite Novotel Berlin City Potsdamer Platz
openstreetmap:venue:node/340388625 Dalí - Die Ausstellung am Potsdamer Platz
openstreetmap:venue:node/365726272 Mövenpick Hotel Berlin am Potsdamer Platz
openstreetmap:venue:node/394948618 U Potsdamer Platz
openstreetmap:venue:node/394948620 U Potsdamer Platz
openstreetmap:venue:node/396004773 U Potsdamer Platz
openstreetmap:venue:node/1251201121 U-Bhf. Potsdamer Platz/Linkstraße
openstreetmap:venue:node/1259545063 ibis Berlin City Potsdamer Platz
openstreetmap:venue:node/1318190070 Potsdamer Platz
openstreetmap:venue:node/1336883756 Stage Theater am Potsdamer Platz
openstreetmap:venue:node/1573193294 S+U Potsdamer Platz
openstreetmap:venue:node/1573193317 S+U Potsdamer Platz
openstreetmap:venue:node/1576240058 S+U Potsdamer Platz
openstreetmap:venue:node/1788588105 Potsdamer-Platz-Apotheke
openstreetmap:venue:node/2451831603 S Potsdamer Platz
openstreetmap:venue:node/2451831708 S Potsdamer Platz
openstreetmap:venue:node/2454155640 S+U Potsdamer Platz
openstreetmap:venue:node/2456291607 S Potsdamer Platz
openstreetmap:venue:node/2491792013 S+U Potsdamer Platz
openstreetmap:venue:node/2491824683 S Potsdamer Platz
openstreetmap:venue:node/2491846867 S Potsdamer Platz
openstreetmap:venue:node/2491848735 S Potsdamer Platz
openstreetmap:venue:node/2491871589 S+U Potsdamer Platz
openstreetmap:venue:node/2491871593 S+U Potsdamer Platz
openstreetmap:venue:node/2491872658 S Potsdamer Platz
openstreetmap:venue:node/2581112669 Hotel Potsdamer Hof Berlin Am Potsdamer Platz
openstreetmap:venue:node/2749149916 Potsdamer Platz/Ebertstraße
openstreetmap:venue:node/3493177833 Berlin Potsdamer Platz
openstreetmap:venue:node/3519366793 Motel One Potsdamer Platz
openstreetmap:venue:node/3780937605 NH Berlin Potsdamer Platz
openstreetmap:venue:node/3837620167 Alper Hotel am Potsdamer Platz
openstreetmap:venue:node/3837620172 B&B Hotel Berlin-Potsdamer Platz
openstreetmap:venue:node/3854590996 S Potsdamer Platz
openstreetmap:venue:node/3854591000 U Potsdamer Platz
openstreetmap:venue:node/5333707421 WeWork Potsdamer Platz
openstreetmap:venue:node/6143046477 Potsdamer-Platz-Apotheke
openstreetmap:venue:way/22989670 Potsdamer Platz Arkaden
openstreetmap:venue:way/81456696 Scandic Berlin Potsdamer Platz
openstreetmap:venue:way/92335625 Crowne Plaza Berlin Potsdamer Platz
openstreetmap:venue:relation/3200536 Potsdamer Platz

There's still 4x "U Potsdamer Platz" and 7x with some form of "S+U Potsdamer Platz" 😆

Some entries like "Varian-Fry-Straße/Potsdamer Platz" have been completely removed, which is to be expected since this PR removes all bus/tram stops too.

Nominatim does actually include one version of that bus stop and I guess it's a product decision about whether to include those or not and how much value they provide, it's most likely being included from being tagged highway:bus_stop.

missinglink commented 3 years ago

Public transit stops are actually quite tricky, this is what Nominatim returns for "S+U Potsdamer Platz"

Screenshot 2021-07-23 at 16 43 25

And what Pelias autocomplete returns (we do a good job of deduping down from 17 hits):

Screenshot 2021-07-23 at 16 45 31

The /v1/search endpoint returns garbage because libpostal doesn't understand the input 🤷