pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
112 stars 72 forks source link

Added missing street mapping #565

Closed Bimbol closed 2 years ago

Bimbol commented 2 years ago

Because of missing tag addr:place in schema, importer skip a lot of addresses. For example in Mannheim, DE most of the streets named with letter is not imported. image

So to fix this, I added missing tag.

missinglink commented 2 years ago

Hi @Bimbol, we've had a few reports about these unusual addressing schemas over the years, for instance in Polish towns you'll find the use of a housenumber but combined with the locality name instead of the street, in this case the grid reference is used in place of the street name.

In both cases it seems the preferred way of mapping it in OSM is using the addr:place tag, which is nice, but this tag is also commonly used for addresses mapped with the more conventional syntax too.

The proposal in your PR would set an addr:street when available, but then replace it with addr:place if also available, this is not correct as addr:street is preferred when available.

Other than that, it should, to some degree 'just work' with the place name being used in the street field, of course the parsers are going to get confused but hopefully it'll be handled fairly gracefully.

tl;dr, these are 'special cases', so we need to take care to 'special case' them correctly.

missinglink commented 2 years ago

Also worth noting that the Karlsruhe Schema has defined meaning, I don't believe these addressing schemas are compatible, so.. while convenient, it might make more sense to have this in a new code file with some comments about what it is and maybe a link or two for future reference.

missinglink commented 2 years ago

for reference here is an address from Mannheim with this addressing schema:

here's an example from the same area where using the addr:place when an addr:street is available is incorrect:

discussion in German:

Bimbol commented 2 years ago

Thank you for explanation. I was wondering why this file has city name in it :)

So changing the order of schemas in merge could resolve issue with priority I guess?

missinglink commented 2 years ago

oh also worth mentioning that the config here: https://github.com/pelias/openstreetmap/blob/master/config/features.js#L9 will need to change, these settings control which tags are extracted by pbf2json, you'll need to add another line below the one I linked with says 'addr:housenumber+addr:place'.

orangejulius commented 2 years ago

Hey @Bimbol thanks for bringing this up, we should definitely support addr:place where possible.

I can confirm that: addr:street and addr:place are not supposed to both be set, according to the OSM wiki page for addr:place. There's also some discussion about how many uses of addr:place, especially in the UK, don't follow that rule

But like @missinglink mentioned in a previous discussion, we may not want to edit this particular file to do it: https://github.com/pelias/pelias/issues/787#issuecomment-477137803. Is that still good advice @missinglink? I don't want to create too much work but I can see the need for an addr:place specific schema.

missinglink commented 2 years ago

So changing the order of schemas in merge could resolve issue with priority I guess?

Yeah, there's a bunch of ways you could code it up, but maybe just splitting it out to its own file and then finding a way to get it to only apply when addr:street isn't present but addr:housenumber & addr:place are set would work.

Unit tests please!

Bimbol commented 2 years ago

Can I modify tags inside pipeline, or should the original ones be kept? Because one of solution is remove "invalid" tag addr:place when addr:street is present.

missinglink commented 2 years ago

I guess you can edit the tags, although it might make debugging more difficult.

So for instance you could create a new stream and add it in before address_extractor.js, call it addresses_without_street.js.

In that stream simply check the tags for the existence of addr:housenumber AND addr:place AND NOT addr:street, in this situation just copy the addr:place to addr:street.

It would be easy to write tests for since it would be totally contained and the rest of the code would 'just work' without modification or risk of introducing failing cases which would make it hard to approve for merging.

This comment is still required: https://github.com/pelias/openstreetmap/pull/565#issuecomment-1062872146

reference: https://help.openstreetmap.org/questions/44160/addresses-with-no-street-names

Bimbol commented 2 years ago

I guess you can edit the tags, although it might make debugging more difficult.

So for instance you could create a new stream and add it in before address_extractor.js, call it addresses_without_street.js.

In that stream simply check the tags for the existence of addr:housenumber AND addr:place AND NOT addr:street, in this situation just copy the addr:place to addr:street.

It would be easy to write tests for since it would be totally contained and the rest of the code would 'just work' without modification or risk of introducing failing cases which would make it hard to approve for merging.

This comment is still required: #565 (comment)

reference: https://help.openstreetmap.org/questions/44160/addresses-with-no-street-names

I applied some changes according to your suggestions. I am waiting for some feedback :) I was also thinking if this could be configured somehow, however simple configuration may not cover all potencial use cases.

Thank you for your time!

missinglink commented 2 years ago

Feedback resolved via new commit, the code in this PR looks good-to-merge.

I'd like to just confirm that it solves the reported issue, from what I can see it's unlikely to introduce error to existing results, which is great, but I wanted to make sure it actually fixes the issue in Mannheim (or at least gets us closer to a fix).

Bimbol commented 2 years ago

Sorry, that I didn't reply after committing changes. I was still running some tests. Everything seems to work fine. I did checked quite a few streets, that were missing :) Zrzut ekranu z 2022-03-14 10-14-00

Bimbol commented 2 years ago

Feedback resolved via new commit, the code in this PR looks good-to-merge.

I'd like to just confirm that it solves the reported issue, from what I can see it's unlikely to introduce error to existing results, which is great, but I wanted to make sure it actually fixes the issue in Mannheim (or at least gets us closer to a fix).

Yes, it resolves the issue with missing addresses in Mannheim.

Bimbol commented 2 years ago

@missinglink Is still any problem with this PR? Because is open for a while now.

missinglink commented 2 years ago

There are a lot of organisations running Pelias in production so we always test global changes such as this on a wider scale before merging.

This is mostly to ensure no regressions are introduced, but also to ensure that the proposed solution adequately resolves the reported problem, there are often unforeseen side effects exposed which aren't always apparent during dev work, such is the nature of a search engine, you can't rely only on unit tests.

I've made a copy of your master branch in this org so you're free to use it for something else, if you'd like to potentially add to this feature then I'd recommend you cut a dev branch on your fork and open the PR from there.

Since it's now in our org you'll find Docker images are automatically generated, one for the branch and one for the commit:

I'll kick off a global build using the commit above and run the acceptance-tests to test for any unforseen issues.

Bimbol commented 2 years ago

There are a lot of organisations running Pelias in production so we always test global changes such as this on a wider scale before merging.

This is mostly to ensure no regressions are introduced, but also to ensure that the proposed solution adequately resolves the reported problem, there are often unforeseen side effects exposed which aren't always apparent during dev work, such is the nature of a search engine, you can't rely only on unit tests.

I've made a copy of your master branch in this org so you're free to use it for something else, if you'd like to potentially add to this feature then I'd recommend you cut a dev branch on your fork and open the PR from there.

Since it's now in our org you'll find Docker images are automatically generated, one for the branch and one for the commit:

I'll kick off a global build using the commit above and run the acceptance-tests to test for any unforseen issues.

Those images are exactly what I need right now. So thank you for this great project :)

missinglink commented 2 years ago

This feature is up on the Geocode Earth dev server right now (although be aware that server changes branches we're testing often).

Seems to work great for addresses in Mannheim:

https://pelias.github.io/compare/#/v1/autocomplete?text=8+Q1+mannheim

Screenshot 2022-03-17 at 13 28 51
Doesn't work for streets in Mannheim:

https://pelias.github.io/compare/#/v1/autocomplete?text=C4%2C+mannheim

This would require a corresponding patch to the commands we use to generate polylines extracts, so both my missinglink/pbf streets command and the valhalla routing engine we advocate using for production builds.

Unfortunately doesn't solve https://github.com/pelias/pelias/issues/642

https://pelias.github.io/compare/#/v1/autocomplete?text=Doln%C3%AD+%C4%8Cermn%C3%A1+314%2C+CZ

This is due to the parser identifying that the locality name is in fact a locality and so not querying the street field, we might be able to do something about that but it would need some consideration.

Bimbol commented 2 years ago
Doesn't work for streets in Mannheim:

https://pelias.github.io/compare/#/v1/autocomplete?text=C4%2C+mannheim

This would require a corresponding patch to the commands we use to generate polylines extracts, so both my missinglink/pbf streets command and the valhalla routing engine we advocate using for production builds.

I did imported whole Germany today and in my case search works great, but as in example autocomplete not. Here are some sample data from my database, for input C4, Mannheim. I don't know why but with the same input for your dev server I get only cities in result.

"places": [
    {
      "gid": "pelias:address:way/197559108",
      "label": "C4 8, Mannheim, BW, Niemcy",
      ...
    },
    {
      "gid": "pelias:address:way/130369581",
      "label": "C4 9b, Mannheim, BW, Niemcy",
      ...
    },
    {
      "gid": "pelias:address:way/181346505",
      "label": "C4 1, Mannheim, BW, Niemcy",
      ...
    },
    ...
]

About the issue you mentioned, I'm not quite sure if I understand whats the problem here. Could you please give me some more details?

missinglink commented 2 years ago

Okay this all looks good to me. Globally, it pulls in an additional ~9MM addresses, which is more than I expected.

There was an issue with the deployment yesterday where the dev machine ran out of disk, so it wasn't quite fully loaded, we resolved that issue and it seems the behaviour of /v1/search now matches what you were seeing on your local installation:

https://pelias.github.io/compare/#/v1/search?sources=osm&layers=address&text=C4%2C+mannheim

Screenshot 2022-03-18 at 15 20 53

There is still additional work required to adapt the queries for this new format, but for now this is good to go 🎉

missinglink commented 2 years ago

thanks @Bimbol, @pbieniek the robots will cut new docker images and tag them latest and master now, they'll be available in a couple minutes.