ltog / osmi-addresses

Calculates the Address view of the OSM Inspector
Boost Software License 1.0
11 stars 4 forks source link

misformatted_housenumber_lenient – better regex #112

Closed Nakaner closed 6 years ago

Nakaner commented 6 years ago

This implements the regular expressions suggested by @grischard and @ltog in #107 but with modifications to support house numbers used in Württemberg (and partially in Baden): 5/1 instead of 5a.

Accidentially, the regular expression also supports house numbers used in Augsburg 5 1/2, even concatenated with commas.

This pull request closes #107.

ltog commented 6 years ago

@Nakaner: Thank you for your work and sorry for the late response.

Some points:

  1. Do you plan on having one or two layers showing misformatted housenumbers?
  2. The regex marks multiple housenumbers written as 5, 6 as error. I guess software evaluating housenumbers will need to accept spaces anyway, so maybe we would better only highlight housenumber entries with severe errors?
  3. The regex will mark housenumbers >999 as error. I believe for example in the USA there are a lot of housenumbers with 4 or even 5 digits. Example: https://tools.geofabrik.de/osmi/?view=addresses&lon=-95.23924&lat=38.94582&zoom=18&overlays=buildings,buildings_with_addresses,postal_code,entrances_deprecated,entrances,misformatted_housenumber_lenient,nodes_with_addresses_defined,nodes_with_addresses_interpolated,interpolation,interpolation_errors,connection_lines,nearest_points,nearest_roads,nearest_areas,addrx_on_nonclosed_way Maybe we should increase this limit?
  4. I guess 5,,,,,,6 shouldn't be accepted as valid housenumber. So * in the regex maybe should be removed or replaced with ?? (Sidenote: It was myself that proposed such a wrong(?) version in https://github.com/ltog/osmi-addresses/issues/107#issuecomment-286237958 )
  5. Maybe we should setup a test suite with housenumbers from different countries to check a new regex against it? See also https://github.com/ltog/osmi-addresses/issues/93
  6. Which version of the code is currently live on geofabrik.de? I used to tindicate that with the branch pointer currently_running_on_geofabrik_server. Maybe you would like to use it too?
Nakaner commented 6 years ago

@ltog wrote:

  1. Do you plan on having one or two layers showing misformatted housenumbers?

I don't plan to do it. The layer will be available (see the GetCapabilities of the WMS) but it will not be advertised in the layer tree on the left sidebar of OSMI. If I enable both layers and request a image at zoom level 11 of the area around Frankfurt am Main, it takes almost as long as only requesting the layer with the strict validation but it renders a lot of false positives. If the old layer has similar regex than the new one (but not accepting concatenated house numbers like "3, 4, 5, 6"), it is as slow as the new layer.

  1. The regex marks multiple housenumbers written as 5, 6 as error. I guess software evaluating housenumbers will need to accept spaces anyway, so maybe we would better only highlight housenumber entries with severe errors?

I improved my regular expressions.

  1. The regex will mark housenumbers >999 as error. I believe for example in the USA there are a lot of housenumbers with 4 or even 5 digits.

I increased the limit to accept everything between 1 and 99999.

  1. I guess 5,,,,,,6 shouldn't be accepted as valid housenumber. So * in the regex maybe should be removed or replaced with ??

fixed

  1. Which version of the code is currently live on geofabrik.de? I used to tindicate that with the branch pointer currently_running_on_geofabrik_server. Maybe you would like to use it too?

I updated this branch. This pull request is currently live on our server.