pelias / pbf2json

An OpenStreetMap pbf parser which exports json, allows you to cherry-pick tags and handles denormalizing ways and relations. Available as a standalone binary and comes with a convenient npm wrapper.
https://pelias.io
MIT License
143 stars 36 forks source link

support relations, reduce disk usage #70

Closed missinglink closed 5 years ago

missinglink commented 6 years ago

A long-anticipated feature, adding support for OSM relations 🎉

In order to support relations I've had to make some changes to how the parser works:

I'm using a bitmask to record which elements we consider 'interesting', which greatly reduces the amount of data stored in leveldb. Previously this used ~100GB for the planet storing all nodes, I suspect this is now greatly reduced despite now storing some ways used to denormalize the relations.

I suspect that the running time will also be greatly reduced as there are far fewer bytes being written to leveldb, the second pass on a full planet file will add around ~20minutes but it'll likely still be much faster overall.

I'm not 100% happy with the bbox and centroid calcs, so I might push one more commit to fix this, for now it's just using the bbox and centroid for the first way listed on a relation. It should really consider all ways.

Centroid and Bounds values are based off the largest member way by area (wayBounds.GeoWidth() * wayBounds.GeoHeight()).

closes https://github.com/pelias/pbf2json/issues/47 closes https://github.com/pelias/pbf2json/pull/56 closes https://github.com/pelias/openstreetmap/issues/81

missinglink commented 6 years ago

I did some performance testing today against a Portland, OR extract:

time reported a similar build time:

# before PR
real    1m54.447s
user    3m37.553s
sys 0m26.158s

# after PR
real    1m47.406s
user    3m31.309s
sys 0m26.413s

disk usage is significantly reduced:

106M    /tmp/testing/A
57M /tmp/testing/B

An additional 35 places were imported, all of which are relations:

{"id":104541,"type":"relation","tags":{"boat":"yes","name":"Columbia River","natural":"water","type":"multipolygon","waterway":"riverbank"},"centroid":{"lat":"45.5681191","lon":"-122.4104718"},"bounds":{"e":"-122.4096895","n":"45.5683790","s":"45.5677538","w":"-122.4110949"}}
{"id":905598,"type":"relation","tags":{"addr:city":"Lake Oswego","addr:housenumber":"4000","addr:street":"Kruse Way Place","type":"multipolygon"},"centroid":{"lat":"45.4181555","lon":"-122.7173130"},"bounds":{"e":"-122.7166134","n":"45.4184720","s":"45.4178526","w":"-122.7179679"}}
{"id":1536312,"type":"relation","tags":{"name":"Moshofsky Swamp","natural":"wetland","type":"multipolygon","wetland":"swamp"},"centroid":{"lat":"45.5269351","lon":"-122.8461349"},"bounds":{"e":"-122.8455496","n":"45.5269411","s":"45.5269290","w":"-122.8467202"}}
{"id":1666862,"type":"relation","tags":{"access":"yes","alt_name":"Portland's Living Room","area":"yes","bicycle":"no","ele":"22","foot":"yes","highway":"pedestrian","leisure":"park","motor_vehicle":"no","name":"Pioneer Courthouse Square","operator":"City of Portland","type":"multipolygon","wikidata":"Q7196678","wikipedia":"en:Pioneer Courthouse Square"},"centroid":{"lat":"45.5185971","lon":"-122.6794560"},"bounds":{"e":"-122.6790448","n":"45.5187092","s":"45.5184851","w":"-122.6798672"}}
{"id":1759187,"type":"relation","tags":{"leisure":"park","name":"Marquam Nature Park","opening_hours":"00:00-00:01,05:00-24:00","operator":"City of Portland","type":"multipolygon"},"centroid":{"lat":"45.5034202","lon":"-122.6884385"},"bounds":{"e":"-122.6882916","n":"45.5035646","s":"45.5032532","w":"-122.6885856"}}
{"id":1759347,"type":"relation","tags":{"leisure":"park","name":"Willamette Moorage Park","operator":"City of Portland","type":"multipolygon"},"centroid":{"lat":"45.4680566","lon":"-122.6695931"},"bounds":{"e":"-122.6681131","n":"45.4699736","s":"45.4658811","w":"-122.6712478"}}
{"id":1760140,"type":"relation","tags":{"addr:state":"OR","area":"yes","boundary":"protected_area","ele":"219","gnis:county_id":"051","gnis:created":"11/28/1980","gnis:feature_id":"1120882","gnis:state_id":"41","leisure":"park","name":"Forest Park","name:ja":"フォーレスト・パーク","natural":"wood","opening_hours":"Mo-Su 05:00-22:00","operator":"City of Portland","protect_class":"7","type":"multipolygon","website":"http://forestparkconservancy.org/forest-park/","wikidata":"Q3077165","wikipedia":"en:Forest Park (Portland, Oregon)"},"centroid":{"lat":"45.5994487","lon":"-122.7891501"},"bounds":{"e":"-122.7875107","n":"45.6017202","s":"45.5977431","w":"-122.7933591"}}
{"id":1761253,"type":"relation","tags":{"leisure":"park","name":"McLoughlin Promenade","operator":"City of Oregon City","type":"multipolygon"},"centroid":{"lat":"45.3584090","lon":"-122.6050877"},"bounds":{"e":"-122.6042859","n":"45.3590543","s":"45.3577358","w":"-122.6057558"}}
{"id":1809077,"type":"relation","tags":{"addr:state":"OR","alt_name":"Brookside Park","ele":"66","gnis:county_id":"051","gnis:created":"05/26/2004","gnis:feature_id":"2040254","leisure":"park","name":"Brookside Natural Area","operator":"City of Portland","type":"multipolygon"},"centroid":{"lat":"45.4750950","lon":"-122.5465011"},"bounds":{"e":"-122.5422096","n":"45.4763110","s":"45.4738800","w":"-122.5493091"}}
{"id":1825500,"type":"relation","tags":{"addr:state":"OR","alt_name":"Reed College Parkway","ele":"39","gnis:county_id":"051","gnis:created":"05/26/2004","gnis:feature_id":"2040354","leisure":"park","name":"Southeast Reed College Parkway","operator":"City of Portland","type":"multipolygon"},"centroid":{"lat":"45.4757673","lon":"-122.6293135"},"bounds":{"e":"-122.6292498","n":"45.4768751","s":"45.4749738","w":"-122.6293766"}}
{"id":1926298,"type":"relation","tags":{"addr:state":"OR","ele":"26","gnis:county_id":"051","gnis:created":"05/22/1986","gnis:feature_id":"1119655","leisure":"park","name":"Dabney State Recreation Area","name_1":"Dabney State Park","operator":"Oregon Parks and Recreation Department","type":"multipolygon","wikidata":"Q1156574"},"centroid":{"lat":"45.5160233","lon":"-122.3532827"},"bounds":{"e":"-122.3419417","n":"45.5184082","s":"45.5146118","w":"-122.3615733"}}
{"id":1932000,"type":"relation","tags":{"addr:state":"OR","ele":"91","gnis:county_id":"051","gnis:created":"05/01/1994","gnis:feature_id":"1166995","leisure":"park","name":"North Powellhurst Park","operator":"City of Portland","type":"multipolygon"},"centroid":{"lat":"45.5141306","lon":"-122.5236147"},"bounds":{"e":"-122.5222553","n":"45.5145006","s":"45.5138974","w":"-122.5247136"}}
{"id":1943370,"type":"relation","tags":{"leisure":"park","name":"LL \"Stub\" Stewart Memorial State Park","operator":"Oregon Parks and Recreation Department","type":"multipolygon"},"centroid":{"lat":"45.7348140","lon":"-123.1850056"},"bounds":{"e":"-123.1733001","n":"45.7564893","s":"45.7149250","w":"-123.1954365"}}
{"id":1957288,"type":"relation","tags":{"alt_name":"Ankeny Plaza","area":"yes","highway":"pedestrian","leisure":"park","name":"Ankeny Arcade","operator":"City of Portland","type":"multipolygon"},"centroid":{"lat":"45.5223232","lon":"-122.6709821"},"bounds":{"e":"-122.6704278","n":"45.5224857","s":"45.5221520","w":"-122.6711964"}}
{"id":2183775,"type":"relation","tags":{"destination":"Pacific Ocean","name":"Columbia River","type":"waterway","waterway":"river","wikidata":"Q2251","wikipedia":"en:Columbia River"},"centroid":{"lat":"45.5662109","lon":"-122.4126847"},"bounds":{"e":"-122.1494537","n":"45.6213714","s":"45.5444463","w":"-122.6819716"}}
{"id":2184039,"type":"relation","tags":{"destination":"Columbia River","name":"Willamette River","type":"waterway","waterway":"river","wikidata":"Q131071","wikipedia":"en:Willamette River"},"centroid":{"lat":"45.2822765","lon":"-122.8145917"},"bounds":{"e":"-122.6275775","n":"45.3476015","s":"45.2304584","w":"-123.0015990"}}
{"id":3945015,"type":"relation","tags":{"access":"private","addr:city":"Brush Prairie","addr:housenumber":"15001","addr:postcode":"98606","addr:street":"Northeast 181st Street","leisure":"golf_course","name":"The Cedars on Salmon Creek","operator":"Private Ownership","phone":"360-687-4233","type":"multipolygon","website":"http://www.golfcedars.com/"},"centroid":{"lat":"45.7559322","lon":"-122.5179217"},"bounds":{"e":"-122.5059683","n":"45.7617032","s":"45.7481642","w":"-122.5267525"}}
{"id":3983723,"type":"relation","tags":{"building":"yes","name":"Public Storage","opening_hours":"Mo-Fr 09:30-18:00; Sa-Su 09:30-17:00","phone":"+1-503-446-5048","type":"multipolygon","website":"http://www.publicstorage.com/"},"centroid":{"lat":"45.4646330","lon":"-122.7011964"},"bounds":{"e":"-122.7007425","n":"45.4649035","s":"45.4643493","w":"-122.7015394"}}
{"id":4767560,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"9045","addr:postcode":"97219","addr:state":"OR","addr:street":"Southwest Barbur Boulevard","building":"yes","type":"multipolygon"},"centroid":{"lat":"45.4602163","lon":"-122.7089796"},"bounds":{"e":"-122.7087091","n":"45.4603762","s":"45.4600797","w":"-122.7091847"}}
{"id":5576940,"type":"relation","tags":{"leisure":"pitch","name":"Tennis Courts","sport":"tennis","type":"multipolygon"},"centroid":{"lat":"45.5241041","lon":"-122.9283480"},"bounds":{"e":"-122.9277548","n":"45.5241073","s":"45.5237622","w":"-122.9286007"}}
{"id":5580288,"type":"relation","tags":{"name":"Orenco Gardens Four Scub","natural":"scrub","type":"multipolygon"},"centroid":{"lat":"45.5268541","lon":"-122.9216699"},"bounds":{"e":"-122.9212367","n":"45.5271180","s":"45.5265902","w":"-122.9221030"}}
{"id":6264641,"type":"relation","tags":{"building":"yes","name":"Motel 6","phone":"+1-503-238-0600","tourism":"motel","type":"multipolygon","website":"http://www.motel6.com/","wikidata":"Q2188884","wikipedia":"en:Motel 6"},"centroid":{"lat":"45.4960863","lon":"-122.6335658"},"bounds":{"e":"-122.6333588","n":"45.4962502","s":"45.4959224","w":"-122.6337728"}}
{"id":6354550,"type":"relation","tags":{"building":"yes","name":"Masons Supply Company","shop":"trade","type":"multipolygon","website":"http://www.masco.net/"},"centroid":{"lat":"45.5038444","lon":"-122.6540995"},"bounds":{"e":"-122.6538611","n":"45.5042332","s":"45.5035933","w":"-122.6542232"}}
{"id":6356395,"type":"relation","tags":{"amenity":"school","building":"yes","max_age":"18","min_age":"11","name":"The Northwest Academy","opening_hours":"Mo-Fr 08:30-17:45","phone":"+1-503-223-3367","type":"multipolygon","website":"http://www.nwacademy.org/","wikidata":"Q7754390"},"centroid":{"lat":"45.5176059","lon":"-122.6853836"},"bounds":{"e":"-122.6851364","n":"45.5177799","s":"45.5174318","w":"-122.6856308"}}
{"id":7588918,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"1421-1441","addr:postcode":"97232","addr:state":"OR","addr:street":"Northeast Broadway","building":"commercial","description":"National Register of Historic Places # 93000024","heritage":"2","heritage:operator":"nrhp","historic":"building","nrhp:criteria":"C","nrhp:inscription_date":"1993-02-11","old_name":"Olsen and Weygandt Building","ref:nrhp":"93000024","source":"National Register nomination file http://heritagedata.prd.state.or.us/historic/index.cfm?do=main.loadFile&load=NR_Noms/93000024.pdf","type":"multipolygon"},"centroid":{"lat":"45.5352255","lon":"-122.6506842"},"bounds":{"e":"-122.6506108","n":"45.5353244","s":"45.5351515","w":"-122.6507745"}}
{"id":7588952,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"1323-1337","addr:postcode":"97232","addr:state":"OR","addr:street":"Northeast Broadway","alt_name":"Irvington Theater","building":"commercial","description":"Contributing in Irvington Historic District, National Register of Historic Places # 10000850","historic":"building","old_name":"Irvington Movie Theater","source":"National Register nomination file for Irvington Historic District http://heritagedata.prd.state.or.us/historic/index.cfm?do=main.loadFile&load=NR_Noms/10000850.pdf","type":"multipolygon"},"centroid":{"lat":"45.5353049","lon":"-122.6519550"},"bounds":{"e":"-122.6518828","n":"45.5354039","s":"45.5351638","w":"-122.6520181"}}
{"id":7588953,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"1301","addr:postcode":"97232","addr:state":"OR","addr:street":"Northeast Broadway","building":"commercial","description":"Contributing in Irvington Historic District, National Register of Historic Places # 10000850","historic":"building","source":"National Register nomination file for Irvington Historic District http://heritagedata.prd.state.or.us/historic/index.cfm?do=main.loadFile&load=NR_Noms/10000850.pdf","type":"multipolygon"},"centroid":{"lat":"45.5353366","lon":"-122.6521681"},"bounds":{"e":"-122.6520166","n":"45.5354046","s":"45.5352394","w":"-122.6522532"}}
{"id":7625739,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"2074","addr:postcode":"97209","addr:state":"OR","addr:street":"Northwest Lovejoy Street","building":"commercial","description":"Contributing in Alphabet Historic District, National Register of Historic Places # 00001293","historic":"building","old_name":"Joseph Peters Building","source":"National Register nomination file http://heritagedata.prd.state.or.us/historic/index.cfm?do=main.loadFile&load=NR_Noms/00001293.pdf","type":"multipolygon"},"centroid":{"lat":"45.5296497","lon":"-122.6941706"},"bounds":{"e":"-122.6940431","n":"45.5297972","s":"45.5295192","w":"-122.6942423"}}
{"id":7631225,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"3207,3215","addr:postcode":"97232","addr:state":"OR","addr:street":"Northeast Weidler Street","building":"residential","name":"Grant Park Village","type":"multipolygon"},"centroid":{"lat":"45.5348125","lon":"-122.6325893"},"bounds":{"e":"-122.6323902","n":"45.5350609","s":"45.5345212","w":"-122.6331299"}}
{"id":7634061,"type":"relation","tags":{"name":"Ozone","sport":"climbing","type":"site"},"centroid":{"lat":"45.5676584","lon":"-122.2097083"},"bounds":{"e":"-122.2072000","n":"45.5685380","s":"45.5664579","w":"-122.2120886"}}
{"id":7675467,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"1422-1444","addr:postcode":"97232","addr:state":"OR","addr:street":"Northeast Broadway","building":"commercial","type":"multipolygon"},"centroid":{"lat":"45.5348337","lon":"-122.6506623"},"bounds":{"e":"-122.6506141","n":"45.5349370","s":"45.5346895","w":"-122.6507191"}}
{"id":7706872,"type":"relation","tags":{"addr:city":"Portland","addr:housenumber":"808","addr:postcode":"97214","addr:state":"OR","addr:street":"Southeast Morrison Street","building":"commercial","description":"National Register of Historic Places # 06001034","heritage":"2","heritage:operator":"nrhp","historic":"building","nrhp:criteria":"A","nrhp:inscription_date":"2006-11-15","old_name":"Grand Central Public Market","ref:nrhp":"06001034","source":"National Register nomination file http://heritagedata.prd.state.or.us/historic/index.cfm?do=main.loadFile&load=NR_Noms/06001034.pdf","type":"multipolygon","wikidata":"Q5594411","wikipedia":"en:Grand Central Public Market"},"centroid":{"lat":"45.5171071","lon":"-122.6574102","type":"entrance"},"bounds":{"e":"-122.6569580","n":"45.5171092","s":"45.5166859","w":"-122.6574686"}}
{"id":7809012,"type":"relation","tags":{"building":"yes","building:levels":"2","ele":"223","name":"Vista House","type":"multipolygon"},"centroid":{"lat":"45.5395671","lon":"-122.2443784"},"bounds":{"e":"-122.2442857","n":"45.5396320","s":"45.5395022","w":"-122.2444710"}}
{"id":7809387,"type":"relation","tags":{"addr:city":"Forest Grove","addr:housenumber":"3869","addr:postcode":"97116","addr:state":"OR","addr:street":"Northwest Martin Road","description":"National Register of Historic Places # 84003100","heritage":"2","heritage:operator":"nrhp","historic":"building","name":"Beeks House","nrhp:criteria":"B,C,D","nrhp:inscription_date":"1984-06-14","old_name":"Silas Jacob N. Beeks House","ref:nrhp":"84003100","source":"National Register nomination file http://heritagedata.prd.state.or.us/historic/index.cfm?do=main.loadFile&load=NR_Noms/84003100.pdf","type":"multipolygon","wikidata":"Q7514170","wikipedia":"en:Silas Jacob N. Beeks House"},"centroid":{"lat":"45.5481599","lon":"-123.0744352"},"bounds":{"e":"-123.0743187","n":"45.5482286","s":"45.5481053","w":"-123.0746216"}}
{"id":7828440,"type":"relation","tags":{"addr:city":"Beaverton","addr:state":"OR","description":"National Register of Historic Places # 89000123","heritage":"2","heritage:operator":"nrhp","historic":"building","name":"Blanton House","nrhp:criteria":"C","nrhp:inscription_date":"1989-03-02","old_name":"M. E. Blanton House","ref:nrhp":"89000123","source":"National Register nomination file http://heritagedata.prd.state.or.us/historic/index.cfm?do=main.loadFile&load=NR_Noms/89000123.pdf","type":"multipolygon","wikidata":"Q16894413","wikipedia":"en:M. E. Blanton House"},"centroid":{"lat":"45.4909921","lon":"-122.8514741"},"bounds":{"e":"-122.8513262","n":"45.4910347","s":"45.4909223","w":"-122.8516047"}}
missinglink commented 6 years ago

The new record names seem to mostly be of the natural type:

Columbia River
null
Moshofsky Swamp
Pioneer Courthouse Square
Marquam Nature Park
Willamette Moorage Park
Forest Park
McLoughlin Promenade
Brookside Natural Area
Southeast Reed College Parkway
Dabney State Recreation Area
North Powellhurst Park
LL "Stub" Stewart Memorial State Park
Ankeny Arcade
Columbia River
Willamette River
The Cedars on Salmon Creek
Public Storage
null
Tennis Courts
Orenco Gardens Four Scub
Motel 6
Masons Supply Company
The Northwest Academy
null
null
null
null
Grant Park Village
Ozone
null
null
Vista House
Beeks House
Blanton House
missinglink commented 6 years ago

I suspect that a lot of the relations are being discarded due to one or more of their member ways being truncated when generating the pbf file.

On a larger extract I'd expect to see more relations for the same area.

missinglink commented 6 years ago

Tested against a New Zealand extract:

# before
2m48.271s

# after
2m24.305s
420M    /tmp/testing/A
20M /tmp/testing/B

also only 30 additional records, maybe relations are far less popular than we thought?

MT PERCY NO 2
76 TAUPIRI
Waipara Range
Southern Alps
Ulva Island/Te Wharawhara Marine Reserve
Motueka River
Residential Red Zone
Thomas Bush
Hogs Back Track
Taieri River
Waikato River
Puhoi River
Tauranga-Taupo River
Waitangi River
Wellington
Petone
Wellington Institute of Technology
Te Koutu Park
Napier Golf Club
Lake Okareka
Ōtāhuhu Station
Worsleys Reserve
Maitai River
Avon River
Hikurangi Marine Reserve
Tauranga Primary School
Smyth Stream
Anawhata Stream
Wainamu Stream
missinglink commented 5 years ago

@Joxit what are you thoughts on this PR, should we merge it or does it need more testing?

Joxit commented 5 years ago

I think this PR is ok, I did some tests on relation and that's okay :+1:.

The next feature liked to relations might be the inheritance of tags. Useful for associatedStreet for example. This is just an idea because it's not really used worldwide. Here are some examples https://www.openstreetmap.org/relation/5908976 https://www.openstreetmap.org/relation/4741492

missinglink commented 5 years ago

Wow, to be honest, I didn't even know Relation:associatedStreet existed!

orangejulius commented 5 years ago

I tested this out with an Australia build, and it appeared to work well enough. I'm testing a full planet build now, and will report back on the resulting disk size and build time metrics!

missinglink commented 5 years ago

@orangejulius, I'm going to merge this unless there are any objections?

Pelias can be updated independently via npm semver.

orangejulius commented 5 years ago

The latest full planet build with this branch that we've run worked great. So this is ready to go!

From our testing it looks like this does speed up the OSM import by quite a bit as well.

CatInCosmicSpace commented 5 years ago

Hello! :)

As far as I understand, this branch is related to OSM relations, and you tested it with great results. Also, you updated pelias/openstreetmap importer to support new version of pbf2json.

I would like to know if update for pelias/docker to provide work with OSM relations is planned or not? If so, when to expect?

orangejulius commented 5 years ago

Hi @CatInCosmicSpace, All updates to Pelias importers and services that are merged to master are automatically available in the pelias/docker setup. you will have to run pelias compose pull to download the latest Docker images, but it should all work after that.

Let us know how it goes!

missinglink commented 5 years ago

You can view all the docker image tags here: https://hub.docker.com/r/pelias/openstreetmap/tags

This feature should be available under the tag latest since master-2019-02-27-b6a7530d5da151e43c453cd13efb41b64a58e230 was published.

CatInCosmicSpace commented 5 years ago

Hello again! Thank you for your reply.

Also, I want to inform you that everything works well!

Relation support is a great feature :)