osmlab / osm-planning

General OSM tools planning and wishlist
Other
18 stars 1 forks source link

PBF change for big data #4

Open jonathanl-telenav opened 6 years ago

jonathanl-telenav commented 6 years ago

Currently, to determine the bounding rectangle of ways in a PBF that we're splitting up into pieces, we keep a map from node id to location in memory. Unfortunately, this takes up an increasingly large amount of memory as the OSM data set grows and grows (success brings certain problems!). A really useful change from my perspective would be to include the bounding rectangle of each way in the data so it's easy to determine where a way is without the giant node->location map. This could probably be done in an efficient (perhaps it is possible to store the bounding rectangle in two long values and also reduce the storage of node from two doubles to a single long, resulting in minimal file size growth) and backwards-compatible way (perhaps there's a way to detect version in the PBF header somehow) with some thought given to the matter by the technical leaders of the project. Thanks for listening! -- Best, Jon

ImreSamu commented 6 years ago

imho: similar idea with osmium support

mvexel commented 6 years ago

Looping in @joto

joto commented 6 years ago

Including the bounding rectangle would help with only one specific use-case, namely creating some kind of geographical extract of the data. Adding the node locations to the ways as implemented by me and mentioned by @ImreSamu above allows a lot more uses. Yes, it has somewhat more overhead, but if you leave out all the nodes that don't have tags (osmium add-locations-to-ways can do this), the resulting file has a similar size to the original planet file. So I think this is the better solution.

@jonathanl-telenav: Have a look at this and tell us if this helps you at all.

tyrasd commented 6 years ago

@joto: that approach doesn't generalize to full history files, does it?

joto commented 6 years ago

@tyrasd No, this can't work on full history files, because a node might change while the way using it didn't change. In effect we would have to make two way versions out of the one way version, one with the old node location, one with the new one. But this would give us two ways with the same version number.

In theory we could change the timestamp on the way and leave the version number. This way we could still tell the two way versions apart, but this would break if there are several changes in the same second. And it would break the assumption that (object-type, id, version) is unique in history files.

jonathanl-telenav commented 6 years ago

I have modified Osmosis (posted a patch to the osmosis-dev list) to work with the add-locations-to-ways osmium output and it works really well. We're able to read planet files with much less memory use and in less time as well (due mainly to inefficiencies in managing giant maps).

We wouldn't really want PBFs with the nodes left out entirely because we sometimes (rarely, but often enough to care) need other metadata information on those nodes like the version number or modification time, but even with what looks like a fairly significant increase in size due to the added locations, the size increase would still be very much worth it. I could imagine OSM publishing a thin and a fat version of planet PBFs (with and without untagged nodes), but with networks speeds and disk costs what they are, I don't personally care for the thinner PBF with untagged nodes omitted.

Best,

Jon

From: Jochen Topf notifications@github.com Sent: Thursday, March 8, 2018 6:52:06 AM To: osmlab/osm-planning Cc: Locke, Jonathan; Mention Subject: Re: [osmlab/osm-planning] PBF change for big data (#4)

Including the bounding rectangle would help with only one specific use-case, namely creating some kind of geographical extract of the data. Adding the node locations to the ways as implemented by me and mentioned by @ImreSamuhttps://github.com/imresamu above allows a lot more uses. Yes, it has somewhat more overhead, but if you leave out all the nodes that don't have tags (osmium add-locations-to-ways can do this), the resulting file has a similar size to the original planet file. So I think this is the better solution.

@jonathanl-telenavhttps://github.com/jonathanl-telenav: Have a look at this and tell us if this helps you at all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/osmlab/osm-planning/issues/4#issuecomment-371492178, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOU0oDHbIawpsQM4xKiBosklmX3Lfkm5ks5tcTeGgaJpZM4SXBgo.