The current Implementation doesn't allow for any deleted objects. Since the deleted objects aren't in the input file for add_tags, they don't ever get queried from the DB.
Further, the add_tags taking streaming input of geojson works really well for augmenting existing files, but requiring all of the steps for new data: .osh -> .osm -> .geojson -> history.geojson is a bit cumbersome (and we lose deleted objects).
Solutions
Add a special key for deleted tags (Can you query rocksdb and get back an iterator matching a key regex?) --> Or maybe keep a running list of deleted IDs to store in their own key that will be fetched and then written out IF a deleted flag is set? (Deleted Objects would also have to contain their last-known-geometry; we'd have to start storing (all) node geometries as well, so that these could be recreated); unfortunately we don't know what version is the most recent, so we'd have to simply start from v0 and increment until it failed).
What if we stored the node table in memory?
attr
type
size
id
<int64>
8 bytes
version
<int>
1 byte
changeset
<long >
4 bytes
timestamp
<long>
4 bytes
lon
<double >
8 bytes
lat
< double >
8 bytes
33 bytes
33 * 10 Billion = 330 Billion Bytes = 330 gigabytes for the planet, currently.
This is not that crazy for parsing the planet (wouldn't have to do this very often) A 512G machine could handle this for the foreseeable 2 - 3 years if crunching the entire history on a regular basis.
Can we simply iterate over all keys in rocksdb? This takes steps out of the process BUT then we'd have to store geometries -- which brings us back to square one; but I do think this is a worthy path to travel down: A separate node!version -> changeset,time,lat,lon store for non deleted nodes.
This could all be dramatically simplified if the VISIBLE flag was FALSE for any historical object versions that were no longer the visible version of the object :)
/cc @lukasmartinelli @batpad ...just my evening :bulb:
The current Implementation doesn't allow for any deleted objects. Since the deleted objects aren't in the input file for
add_tags
, they don't ever get queried from the DB.Further, the
add_tags
taking streaming input of geojson works really well for augmenting existing files, but requiring all of the steps for new data:.osh
->.osm
->.geojson
->history.geojson
is a bit cumbersome (and we lose deleted objects).Solutions
deleted
flag is set? (Deleted Objects would also have to contain their last-known-geometry; we'd have to start storing (all) node geometries as well, so that these could be recreated); unfortunately we don't know what version is the most recent, so we'd have to simply start from v0 and increment until it failed).What if we stored the node table in memory?
<int64>
<int>
<long >
<long>
<double >
< double >
33 * 10 Billion = 330 Billion Bytes = 330 gigabytes for the planet, currently.
Can we simply iterate over all keys in rocksdb? This takes steps out of the process BUT then we'd have to store geometries -- which brings us back to square one; but I do think this is a worthy path to travel down: A separate
node!version
->changeset,time,lat,lon
store for non deleted nodes.This could all be dramatically simplified if the
VISIBLE
flag wasFALSE
for any historical object versions that were no longer the visible version of the object :)/cc @lukasmartinelli @batpad ...just my evening :bulb: