Closed morandd closed 1 year ago
Please provide details about how and what you tested. Preferably provide code that allows us to reproduce your issue.
Hard to provide a simple test case. But the difference was ~60sec (old method) vs ~130sec (dict() method) and this was consistent over several runs and data files. My files were taking 90 seconds to visit and read lat/lon/tags into Python lists. That number dropped to 30sec if I drop the tags processing, and further drops to 3 seconds if I drop the lat/lon copying. So essentially, it seems that copying data from libosmium into Python isn't highly performant.
I am not surprised if this issue is hard to fix. Optimizing this might need some low-level investigation related to the python interpreter and Python::C++ binding. I mostly wanted to report this so others can gain a speedup by reverting to the old method.
libosmium seems to be the only(?) convenient way to create .osm.pbf files. Your pyosmium binding is wonderful and much appreciated, but it may be that for highest performance one needs to use libosmium natively in C++.
Issue #106 is fixed, but the fix is slow.
In my testing the new conversion introduced in the fix for #106 d = dict(o.tags) is twice as slow as the old method d = {tag.k: tag.v for tag in o.tags}
This makes a big difference when processing large OSM datasets.