pelias / openstreetmap

Import pipeline for OSM in to Pelias
MIT License
111 stars 74 forks source link

Rethink features whitelist #19

Open missinglink opened 10 years ago

missinglink commented 10 years ago

We currently have this features whitelist which can be a little too inclusive sometimes.

eg. Taxiway 'U' at JFK, I doubt anyone will ever want to search this: http://www.openstreetmap.org/way/5784731#map=17/40.65536/-73.78819 [edit] probably best to disable some of the aeroway tags but not all of them https://gist.github.com/missinglink/c74dc7a2ba34bfc4bdcb

This ticket is to revisit the whitelist from the 'old pelias' and give it a second look/ refresh.

missinglink commented 9 years ago

http://www.openstreetmap.org/way/10525375 [edit] wontfix re: adding highway tags for the same reason as https://github.com/pelias/openstreetmap/issues/28 plus this is not a POI.

missinglink commented 9 years ago

consider blacklisting public_transport:stop_area http://www.openstreetmap.org/node/2478086894 [edit] it seems best to completely remove public_transport because of duplicates, see https://gist.github.com/missinglink/b7757d98d034f1441b74

missinglink commented 9 years ago

http://www.openstreetmap.org/node/2907810083 [edit] same as above, there are a lot of duplicates for railway:* https://gist.github.com/missinglink/e7f1de91670e7173792c

missinglink commented 9 years ago

http://www.openstreetmap.org/node/2478086894 [edit] same as public_transport:stop_area above

razafinr commented 9 years ago

Hi.

A lot of streets with in suburbs area in OSM have generally the tag highway+name. I understand why you are blacklisting them but I think this is really restrictive and a lot of existing streets in OSM are not imported because of this. And generally speaking, it seems that the blacklisting is there only to avoid duplicates. Isn't there another way to avoid duplicates and importing the data with the blacklisted tags ?

missinglink commented 9 years ago

hi @razafinr crazy timing, I was just looking in to this issue earlier today, I've been cutting some data from a New York City extract to have a look what we might be missing and what we might remove.

that repo is here https://github.com/pelias/osm-featurelist-evaluation with a link to some cuts I made of the data today.

this extract is just the highway+name records for NYC: https://raw.githubusercontent.com/pelias/osm-featurelist-evaluation/master/cuts/highway.text

that files shows what new names would be searchable in Pelias if we added that to the featurelist, I see there is a bunch of undesirable content such as:

node    2708075964  Lamp Post
node    2708075965  Lamp Post
node    2708075966  Lamp Post
node    2708075967  Lamp Post
node    2708075970  Lamp Post
node    2708075974  Lamp Post
node    2708075976  Lamp Post
node    2708075977  Lamp Post
node    2708075979  Lamp Post
node    2708075981  Lamp Post
node    2708075982  Lamp Post
node    2708075983  Lamp Post
node    2708075984  Lamp Post
node    2708075988  Lamp Post

...but there is also a load of good stuff in there, what are your thoughts after seeing this?

razafinr commented 9 years ago

@missinglink Thanks for the repository. I was running it locally and there are definitely a lot more places that could be searchable if we import them (as I explained above, a lot of streets in poor-coverage area have the specific tag highway+name ). I think that what should be done is to import these data with no duplicates, although the duplicates could be used to have the best approximation of the latitude and longitude of the imported data. I know that it could be really hard to handle the ways type in a pbf file but it would be a great improvement of the application if these data could be imported.

missinglink commented 9 years ago

@razafinr I've made some changes to pbf2json which will allow us to be more specific when targeting tags from the OSM extracts, eg. you can now say amenity~toilets,amenity~kindergarten or whatever instead of extracting all records with an amenity:* tag.

I think this will help significantly in reducing some of the rubbish which is getting ingested (ie. 'Lamp Post') but doesn't deal with the issue of duplication.

You mentioned ways, these are currently being imported, the issue I'm having with duplication is that each road segment is modelled as separate way and may be linked by a relation (which we currently don't import), an example is:

In this case both the road segments share the same name and also represent the same "thing", so in an ideal world we could mark them as duplicate, the issue is that; in some cases the tags are better on one record than the other and so they need to be merged, and in order to find a centroid we would need to do some serious math; especially for polygonal geometries like triangles and circles where the centroid does not lie on the border itself.

I would definitely like to find a de-duplication strategy for highway segments which works, generally a 'good', yet naive de-duplicator checks the similarity in name with other entities which are geographically 'near' and then scores the 'closeness' with a scoring algorithm, this strategy is always going to be flawed and so I think using the relation definitions where available will produce far more accurate results. I suspect that only a small amount of way records with duplicate names are linked by a relation record and so, yea, it's tricky. I will discuss with our routing team and see if they have some input on the matter.

here is an example of a relation which contains all the ways for the M1 in the UK: https://www.openstreetmap.org/relation/2332838

missinglink commented 9 years ago

@razafinr it's also worth noting that street addresses are currently being extracted from addr:street and addr:housenumber tags, so if the query logic is correct then we should be able to return all 1 foo street, 200 foo street etc. for the input text foo street.

eg: the POI "Happy Valley Cycles" [1] has valid address info, so a second document [2] is created with the name "8 Church Street", a query such as [3] should return all POI records on that street.

[1] http://pelias.mapzen.com/doc?id=osmnode:1030336099 [2] http://pelias.mapzen.com/doc?id=osmaddress:poi-address-osmnode-1030336099 [3] http://pelias.mapzen.com/search?input=church%20street&lat=-40.95&lon=175.66

orangejulius commented 9 years ago

One particular tag we might want to import is highway:residential. It appears there are about 31 million such roads, so we can't take the addition of it lightly, but like @razafinr said, there is a ton of good data.

Here's one example of a road that we can't find using pelias, but works in Nominatim: https://mapzen.com/pelias#loc=10,48.4861,16.7569&q=am%20wirtshausberg&t=fine&gb=bbox http://nominatim.openstreetmap.org/search.php?q=am+wirtshausberg&viewbox=16.61%2C48.52%2C16.63%2C48.5

CatInCosmicSpace commented 5 years ago

Hello.

I think it would be great if the tags could be set in the importer configuration. And if not set then load tags by default.

For example,

{
  "imports": {
    "openstreetmap": {
      "datapath": "/data/pelias/openstreetmap",
      "leveldbpath": "/tmp",
      "import": [{
        "filename": "somefile.osm.pbf"
      }],
      "tags": [
            "place:city",
            "place:town"
      ]
    }
  }
}

Also, not all relationships such as cities are enabled by default. For example, Moscow (http://openstreetmap.org/relation/2555133). It has `place=city' tag which is not included in current version of importer.

Of course, users can set them on their own by downloading this repo, but I believe that this would be a good feature.

missinglink commented 5 years ago

Hi @CatInCosmicSpace, that sounds like a nice addition, the core team are currently working on other features at the moment but I'd be happy to review and merge a PR to add this functionality to pelias/config and pelias/openstreetmap.

For the most part, it could be implemented here https://github.com/pelias/openstreetmap/blob/master/stream/pbf.js

Regarding Moscow, I added support for relations recently, so please check you're using current versions of all our tools/docker images. If you're using a current version which includes other relations then it's likely due to the clipping model used to create the PBF extracts.

PBF extracts can be generated using a variety of different tools, and as such can be slightly different, even when cropping to the same bounding-box. If the Moscow polygon is missing any of the vertices or rings in the extract then the polygon will be deemed invalid, no attempts are made to try and recover the broken geometry and so it will not appear in the output, you should see a relevant warning in the log.

An easy solution to this problem is to use a larger extract which you are sure will include all the nodes in the relation members.

missinglink commented 5 years ago

Another solution is to simply modify https://github.com/pelias/openstreetmap/blob/master/config/features.js to suit your needs.

If you are using Docker, then you can achieve this without modifying the docker image using a feature called 'bind mounts' which is available in Docker and docker-compose.

missinglink commented 5 years ago

Here's an example of how to use bind-mounts in docker-compose to override the file stored inside the container with one from your local machine: https://github.com/pelias/docker/blob/master/projects/portland-metro/docker-compose.yml#L17

CatInCosmicSpace commented 5 years ago

@missinglink

Thank you! Maybe I will make a pull request later. :)