osmcode / pyosmium

Python bindings for libosmium
https://osmcode.org/pyosmium
BSD 2-Clause "Simplified" License
318 stars 65 forks source link

implicit dict conversion is 2x slower than old method #210

Closed morandd closed 1 year ago

morandd commented 2 years ago

Issue #106 is fixed, but the fix is slow.

In my testing the new conversion introduced in the fix for #106 d = dict(o.tags) is twice as slow as the old method d = {tag.k: tag.v for tag in o.tags}

This makes a big difference when processing large OSM datasets.

lonvia commented 2 years ago

Please provide details about how and what you tested. Preferably provide code that allows us to reproduce your issue.

morandd commented 2 years ago

Hard to provide a simple test case. But the difference was ~60sec (old method) vs ~130sec (dict() method) and this was consistent over several runs and data files. My files were taking 90 seconds to visit and read lat/lon/tags into Python lists. That number dropped to 30sec if I drop the tags processing, and further drops to 3 seconds if I drop the lat/lon copying. So essentially, it seems that copying data from libosmium into Python isn't highly performant.

I am not surprised if this issue is hard to fix. Optimizing this might need some low-level investigation related to the python interpreter and Python::C++ binding. I mostly wanted to report this so others can gain a speedup by reverting to the old method.

libosmium seems to be the only(?) convenient way to create .osm.pbf files. Your pyosmium binding is wonderful and much appreciated, but it may be that for highest performance one needs to use libosmium natively in C++.