mapbox / osm-wayback

Scalable RocksDB index from OSM planet to lookup historic OSM objects.
BSD 3-Clause "New" or "Revised" License
2 stars 2 forks source link

No Deleted Objects #15

Open jenningsanderson opened 7 years ago

jenningsanderson commented 7 years ago

The current Implementation doesn't allow for any deleted objects. Since the deleted objects aren't in the input file for add_tags, they don't ever get queried from the DB.

Further, the add_tags taking streaming input of geojson works really well for augmenting existing files, but requiring all of the steps for new data: .osh -> .osm -> .geojson -> history.geojson is a bit cumbersome (and we lose deleted objects).

Solutions

  1. Add a special key for deleted tags (Can you query rocksdb and get back an iterator matching a key regex?) --> Or maybe keep a running list of deleted IDs to store in their own key that will be fetched and then written out IF a deleted flag is set? (Deleted Objects would also have to contain their last-known-geometry; we'd have to start storing (all) node geometries as well, so that these could be recreated); unfortunately we don't know what version is the most recent, so we'd have to simply start from v0 and increment until it failed).

What if we stored the node table in memory?

attr type size
id <int64> 8 bytes
version <int> 1 byte
changeset <long > 4 bytes
timestamp <long> 4 bytes
lon <double > 8 bytes
lat < double > 8 bytes
33 bytes

33 * 10 Billion = 330 Billion Bytes = 330 gigabytes for the planet, currently.

This is not that crazy for parsing the planet (wouldn't have to do this very often) A 512G machine could handle this for the foreseeable 2 - 3 years if crunching the entire history on a regular basis.

  1. Can we simply iterate over all keys in rocksdb? This takes steps out of the process BUT then we'd have to store geometries -- which brings us back to square one; but I do think this is a worthy path to travel down: A separate node!version -> changeset,time,lat,lon store for non deleted nodes.

  2. This could all be dramatically simplified if the VISIBLE flag was FALSE for any historical object versions that were no longer the visible version of the object :)

/cc @lukasmartinelli @batpad ...just my evening :bulb: