Open michaz opened 4 years ago
To my knowledge there is currently no overhead involved due to a suboptimal XML abstraction. The second read is necessary also for "pure PBF" as have to find out all nodes that are part of "road" ways as a first step. In OSM everything is a way like buildings etc but we do not need those OSM nodes and sure, we could store everything and avoid the first read but in my experiments this did not work well as reads would have to be fast and memory efficient. But I did these kind of tests several years ago and so many things might have changed.
Please note that the osmosis code we currently use is not highly actively maintained and it might be possible to improve things there: https://github.com/openstreetmap/osmosis/tree/master/osmosis-osm-binary
We should also keep an eye on baremaps that have developed a replacement it seems (but might be in an alpha shape) https://github.com/baremaps/baremaps/blob/master/baremaps-osm/src/main/java/com/baremaps/osm/parser/PBFFileBlockParser.java have not tried it yet, but was in contact with one of the authors @bchapuis.
@michaz @karussell I had the opportunity to play with the OSM reader over the weekend and tried to integrate the xml and pbf readers available in Baremaps. The integration is an early prototype, but the preliminary results look promising: after some modifications in Baremaps, all the graphhopper unit tests are passing.
As for now, I simply modified the OSMReader class and converted the Baremaps objects (e.g. Node) into graphhopper objects (e.g. ReaderNode): https://github.com/baremaps/graphhopper/tree/baremaps
The branch of Baremaps I'm currently using in GraphHopper is the following: https://github.com/baremaps/baremaps/tree/graphhopper
As my primary use case for Baremaps is the generation of vector tiles, the API of the OSM reader is raw in the edges and could be improved a lot. It would be nice to have your feedback on how I could improve it.
Thanks for looking into it. The change looks really small :+1:
-Xmx1024m
Is it necessary to increase the memory? If yes, is it necessary to increase it so heavy? This would be a blocking issue as the import process is already very memory intense and heavily optimized in this matter.
- if (++tmpWayCounter % 10_000_000 == 0) {
Was the progress info removed for simplicity reasons only?
Sorry, the -Xmx1024m
is a leftover from a memory issues I encountered while adapting the reader, all the tests are now passing without this parameter. Baremaps does not require a buffer of objects. It relies on the Stream and Spliterator API available in java 8, so the memory consumption should remain very low.
As the data is processed in parallel, I should probably use an AtomicLong instead of a long for the tmpWayCounter
and display progress indications. I picked the simplest solution (removal) as other aspects may need to be discussed before diving into a proper implementation. For instance, is the com.graphhopper.reader.osm.pbf
package still needed, should we map baremaps's object model (Node) to graphhopper's object model (ReaderNode) or rely on a common object model, etc.
As the GraphHopper use case is really interesting, I would start by stabilizing and hardening the Baremaps API according to your needs and then adapt the osm-reader accordingly. How do you think we should proceed?
How do you think we should proceed?
Good question. From our side this is not a critical issue and so there is no big pressure :)
And we would have to test the baremaps library in production against planet pbf (speed, ram usage, correctness, ...) before we could decide something. And to be honest I would not want that you invest time only for us and then we decide against a switch :).
For instance, is the com.graphhopper.reader.osm.pbf package still needed
I don't think so. And I also would not want to couple your library on our classes nor should you couple our classes into your library - but maybe I misunderstood what you said :)
And we would have to test the baremaps library in production against planet pbf (speed, ram usage, correctness, ...) before we could decide something. And to be honest I would not want that you invest time only for us and then we decide against a switch :).
As my bandwidth is currently limited I'm relieved that there is no big pressure on this issue. I think it is worth spending some time on it, and I will work on improving and streamlining the library.
I updated the two branches previously referenced in this issue: https://github.com/baremaps/baremaps/tree/graphhopper https://github.com/baremaps/graphhopper/tree/baremaps
Basically, I worked on streamlining the OSM Reader in Baremaps from its dependencies. I also started removing dead code in graphhopper.
@michaz @karussell Do not hesitate to provide me feedbacks on these changes, I will now work on polishing the XML reader and on improving test coverage.
Works like a charm! And the code is indeed much more modern than ours, and looks like it could be made faster, too!
It's currently slower, probably because it's running single-threaded. But it looks like you already made provisions to read, and particularly to decompress, the FileBlock
s in parallel.
So, what do you think -- should we try and let the FileBlockSpliterator
be able to trySplit()
? (Or whatever else may be necessary to achieve parallelism here -- I'm not sure of course). While keeping in mind that we have to cleanly merge the parallel stream again, since whatever is consuming the Entity
in the end will probably not be thread-safe -- at least we are not.
Exciting!
Great, thank you for the feedback. Regarding parallelism, there is a BatchSpliterator
in Baremaps that is normally used to wrap the FileBlockSpliterator
and achieve parallelism. I must have removed it while performing tests, so I will enable it and let you know once it is ready.
Also, I merged the graphhopper branch back in the master. I still need to improve test coverage, but the next release will come with a streamlined version of the OSM reader.
@michaz my GraphHopper branch [1] now uses the latest version of the master branch in baremaps [2].
[1] https://github.com/baremaps/graphhopper/tree/baremaps [2] https://github.com/baremaps/baremaps/tree/master
You should observe significant improvements in terms of performance. However, one of the tests in Graphhopper is now failing (RoutingAlgorithmWithOSMTest.testAndorraPbf
). Do you know if the blocks in the pbf files are supposed to be processed in an ordered fashion?
Hi @michaz I did some changes on the baremaps side and updated the branch. Whenever I try to enable the parallel processing of the file, I encounter issues during the import and started diffing into the GHLongIntBTree
class. Do you know if this class is thread safe? In the current version of the reader, several threads may be modifying this BTree in parallel. Thanks in advance for your help.
No, none of our code that consumes the OSM entities is thread-safe. (If so, only by accident.)
Only the parsing and decompressing of the data can be parallel. When the elements hit the consumers, it must be sequential again.
Great, thank you for the clarification. So the current version went too far and I will have to ensure that a single thread is consuming what comes out from the parallel stream. I will keep you posted once I have something ready.
Just to clarify, does sequential here imply some sort of ordering or does it simply means that we should only have one thread that consumes what comes from the file?
we should only have one thread that consumes what comes from the file?
This.
Though the other may be true as well, e.g. by convention, the file contains first all the nodes, then the ways, then the relations.
I have been able to execute some benchmarks this morning. The single threaded baremaps reader is a bit slower than the current osm reader. Unfortunately, the multi-threaded (but ordered) baremaps reader results in poor performances.
// single threaded version
OpenStreetMap.entityStream(osmFile.toPath(), false).forEach(handler);
// multi-threaded version (ordered)
OpenStreetMap.entityStream(osmFile.toPath(), true).forEachOrdered(handler);
I guess that the ordering requirement prevents to benefit from the parallel processing capabilities of the Stream API and comes with a overhead (probably due to the reordering of the stream before its ingestion).
I will continue to investigate this issue, but I would be interested in knowing if the ordering requirement could be relaxed when building the graph and what would be the implications?
I have been able to execute some benchmarks this morning. The single threaded baremaps reader is a bit slower than the current osm reader.
I'd suggest benchmarking using JFR:
-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:StartFlightRecording=disk=true,dumponexit=true,filename=recording.jfr,maxsize=300M,settings=profile,path-to-gc-roots=true
The recording.jfr
can be analyzed using JMC: https://adoptopenjdk.net/jmc.html
I guess that the ordering requirement prevents to benefit from the parallel processing capabilities of the Stream API and comes with a overhead (probably due to the reordering of the stream before its ingestion).
I will continue to investigate this issue, but I would be interested in knowing if the ordering requirement could be relaxed when building the graph and what would be the implications?
I'm not sure how to accomplish this using Java Streams but you'd have to ensure that the processing order of element types is enforced: nodes > ways > relations
We could even enforce the order within the pbf similar to the current osm2pgsql 1.3.0 release:
Input files that are not ordered generate a warning. Future versions of osm2pgsql will not work any more with unordered files. If you have unordered files use "osmium sort" to order them
osmium sort manpage:
Objects are sorted by type, ID, and version. IDs are sorted negative IDs first, the positive IDs, both ordered by their absolute values. So the sort order for types and IDs is:
node -1, node -2, …, node 1, node 2, …, way -1, way -2, …, way 1, way 2, …, relation -1, relation -2, …, relation 1, relation 2,
We should also try to reduce itemQueue.poll()
usage within the OSMInputFile. Doing this with every loop iteration is quite costly as it needs to acquire a Lock on each invocation. It might be faster to fill a regular ArrayList using BlockingQueue.drainTo .
see #2191
Thanks for the clarification regarding the order. I digged a bit further, found a way to fix the issue, improved the API, reused the drainTo approach introduced by @otbutz and learnt a lot along the way ;)
The draft pull request is now passing all the unit tests and the performances seem to be in par with the master branch.
https://github.com/graphhopper/graphhopper/pull/2154
master:
2020-11-28 23:52:27,803 [main] INFO com.graphhopper.reader.osm.OSMReader - Starting to process OSM file: 'core/files/switzerland.osm.pbf'
2020-11-28 23:52:39,212 [main] INFO com.graphhopper.reader.osm.OSMReader - creating graph. Found nodes (pillar+tower):6 453 518, totalMB:1458, usedMB:1293
2020-11-28 23:52:46,240 [main] INFO com.graphhopper.reader.osm.OSMReader - 37 754 905, now parsing ways
2020-11-28 23:52:55,468 [main] INFO com.graphhopper.reader.osm.OSMReader - 42 215 093, now parsing relations
2020-11-28 23:52:55,787 [main] INFO com.graphhopper.reader.osm.OSMReader - finished way processing. nodes: 1143855, osmIdMap.size:6463340, osmIdMap:93MB, nodeFlagsMap.size:9822, relFlagsMap.size:384411, zeroCounter:9662 totalMB:1600, usedMB:358
2020-11-28 23:52:55,787 [main] INFO com.graphhopper.reader.osm.OSMReader - time pass1:11s, pass2:16s, total:27s
baremaps:
2020-11-28 23:59:23,989 [main] INFO com.graphhopper.reader.osm.OSMReader - Starting to process OSM file: 'core/files/switzerland.osm.pbf'
2020-11-28 23:59:35,966 [main] INFO com.graphhopper.reader.osm.OSMReader - creating graph. Found nodes (pillar+tower):6 453 518, totalMB:1427, usedMB:908
2020-11-28 23:59:53,266 [main] INFO com.graphhopper.reader.osm.OSMReader - finished way processing. nodes: 1143855, osmIdMap.size:6463340, osmIdMap:93MB, nodeFlagsMap.size:9822, relFlagsMap.size:384411, zeroCounter:9662 totalMB:1459, usedMB:1074
2020-11-28 23:59:53,267 [main] INFO com.graphhopper.reader.osm.OSMReader - time pass1:11s, pass2:17s, total:29s
Would love to have your opinion on this before going further and doing more polishing.
Interestingly, I think #2191 may have been the root cause of what I though I was seeing when I created this ticket. E.g. I now can't see any more performance gain in not fully decoding the nodes in the first pass. (Perhaps they were burning cycles at the queue handover, not in the decoding.) Great work!!
@michaz @otbutz @karussell Yes, the producing side (read osm) got pretty optimal with #2193, really nice job! One remaining area of improvement would be the ability to build the graph in parallel (consuming side), but the effort is probably not worth it.
To me, this issue was a really good opportunity to improve the OSM reader in Baremaps (which was really in alpha state) and to uncover some of the challenges associated with the Java Stream API, so thank your help! Do not hesitate to ping me back if you are interested in using the library at some point, #2154 demonstrates pretty well the simplifications this would enable.
I'm not quite sure about the wording in the OSM wiki but wouldn't this allow parallelization on the PrimitiveBlock level:
Each PrimitiveBlock is independently decompressable, containing all of the information to decompress the entities it contains. It contains a string table, it also encodes the granularity for both position and timestamps.
This should greatly reduce the node ID lookup mentioned in #2194
Ahah, this is also what I thought when first reading the documentation of the pbf format. However, I failed to rely on this assumption for creating geometries when importing data into postgis. I guess this is the reason why imposm3 and osm2pgsql are both building a node cache before importing the data.
A format where all the blocks are independently decompressable would indeed be awesome. Let's create new standard ;)
I start to wonder why our current approach with multiple thread works. The order on the queue is not guaranteed
In my understanding, the PbfDecoder preserves the ordering through the lock mechanism used in the PbfBlobDecoderListener. Hence, only the decompression and decoding occurs in parallel.
By the way, when I tried to relax the ordering with baremaps, most of the problems occurred in the tests of the web module, not in the tests of the reader module. My impression was that building the graph worked fine if the blocks were consumed without order and sequentially. However, the results returned when querying the graph seemed slightly different, hence the failing tests.
I assume that existing pbf files strictly follow the nodes,ways,relations
order.
Interestingly enough we could get away with a massive speedup if we drop the parallel processing. The current osmId->ghId lookup is really slow(#2194) especially for nodes which have been ignored by the preprocess step. It's only necessary because we have no deterministic order. If we process the file sequentially and enforce that IDs are ordered, things change quite a bit. We could always keep the reference to the index of the previous node osmId in the lookup datastructure and avoid costly searches.
Let's do it. Keep parallelism restricted to decompression. Expect a sorted OSM file. (I think everyone does.) Exploit the order.
@bchapuis I also learned something about Java streams, e.g. that apparently the (only?) way to "merge" a parallel stream if you want to avoid low-level stuff (which is the idea, right?) is by saying .forEachOrdered(...)
. Did you know that? I found that totally non-obvious and didn't know it when we were talking about "re-synchronizing" for the single-threaded consumer.
Yes, it is the idea, but so far my experiences with forEachOrdered for re-synchronizing a parallel stream is quite painful... I encountered both stack overflow errors (when the computation applied to the elements of the stream takes time) and out of memory errors (when the elements of the stream are big). I think it has not really been devised for the OSM use case, so I try to avoid it as much as I can...
Interesting.
Let's do it. Keep parallelism restricted to decompression. Expect a sorted OSM file. (I think everyone does.) Exploit the order.
Short roadmap:
A few other benefits:
I did not measure anything, but the OSM import looks like it spends a lot of time combing through the PBF file as if it were an XML file, partly because it wants to keep the XML reader and the PBF reader behind the same interface.
For example, in the first pass through the file, it looks like we're spending minutes just to wind past all the Nodes to the first Way, and we fully decode all the nodes while doing so, just to immediately throw them away.
Maybe worth some cleanup. Maybe worth considering if we still need the XML reader at all.
EDIT: Or maybe I'm misinterpreting something because it's multi-threaded and it fills a huge buffer before actually processing anything.