Closed ghost closed 3 years ago
The second option for the first phase of partitioning the keyspace would have the benefit of being able to farm out the work across a cluster of volunteer systems, which is a long-term goal.
The new scanning approach is much better at using multiple cores when it can and the osm decoding happens in parallel at pre-calculated offsets.
The memory footprint is stable and low but the ingest is only using about 2 cores out of 10 on the vps for the first phase. It could be that this is as fast as a single leveldb will go. I don't think the machine is reaching IO saturation since after 23 hours the leveldb dir is only 111GB.
Some options to explore for the first phase:
flush()
could spawn a new background thread and not wait for it to finish before continuing while gathering more records for the batch, only obtaining a mutex lock when it needs toflush()
again.id%n
for example) to write out to multiple leveldb databasesThe second phase could use some of the same tricks and there are many places in eyros where async operations happen serially instead of in parallel.
The osm2pgsql page states that they can process planet-osm in about a half a day so we have some room for improvement although the peermaps ingest uses far less memory already (osm2pgsql requires a minimum of 64GB ram).