osmcode / pyosmium

Python bindings for libosmium
https://osmcode.org/pyosmium
BSD 2-Clause "Simplified" License
318 stars 65 forks source link

Add add_locations method to SimpleWriter #229

Closed agrenott closed 1 year ago

agrenott commented 1 year ago

Hi, and thanks for pyosmium.

I've been relying on it for my fork of phyghtmap, used to build contour lines from HGT elevation files. This tool outputs elevation contour lines in OSM formats (PBF being preferred), relying on ways containing a LOT of location nodes. The current public API of SimpleWriter enforces a call from Python to C++ for each and every node, which is very resource consuming when adding a lot of them.

This PR adds an efficient method to add a lot of nodes containing only location and ID.

More context and basic benchmark are available here: https://github.com/agrenott/pyhgtmap/issues/15

agrenott commented 1 year ago

Hi @lonvia , any feedback on my proposal? Thanks

lonvia commented 1 year ago

I'm fine with adding such a add_locations() function in general. I do have doubts about adding a required dependency to numpy. Is it much of a performance issue to hand in a Python array of int-Coordinate tuples instead?

lonvia commented 1 year ago

I'm fine with adding such a add_locations() function in general. I do have doubts about adding a required dependency to numpy. Is it much of a performance issue to hand in a Python array of int-Coordinate tuples instead?

lonvia commented 1 year ago

I'm fine with adding such a add_locations() function in general. I do have doubts about adding a required dependency to numpy. Is it much of a performance issue to hand in a Python array of int-Coordinate tuples instead?

agrenott commented 1 year ago

According to pybind documentation (https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html) numpy is a runtime dependency. I understand it's needed only when calling the new method. I'll confirm with a test while fixing other pr issues. I'd rather keep the numpy array interface, which is much more performant and avoid massive date conversion.

agrenott commented 1 year ago

Looks like the "build-ubuntu" builds are failing for some external reason: I guess some change introduced in recent pybind changes, as those builds use latest version from github where "build-default' ones rely on versioned dependencies. I don't really understand why there are 2 different ways to handle builds dependencies, and feel using the latest version from GitHub is quite hazardous. How do you want to handle this?

agrenott commented 1 year ago

The change of behavior comes from this commit from pybind: https://github.com/pybind/pybind11/commit/f70165463328c218d118204efc13aac93783d17b

Starting this commit, many tests fail with such error:

    def __init__(self, parent: 'cosm.TagContainerProtocol') -> None:
        self._pyosmium_data = parent
>       self.iterator = self._pyosmium_data.tags_begin()
E       RuntimeError: return_value_policy = move, but type osmium::memory::CollectionIterator<osmium::Tag const> is neither movable nor copyable!

build/lib.linux-x86_64-cpython-310/osmium/osm/types.py:57: RuntimeError
agrenott commented 1 year ago

I've fixed the compatibility issue with recent pybind11.

agrenott commented 1 year ago

Was so focused on the latest version compatibility I forgot the current one's... Should be fixed now. BTW, is there any way to automate the execution of CI/CD on PR update? It would speed up the collaboration quite a lot.

agrenott commented 1 year ago

Now it seems Ubuntu 18.04 workers are deprecated... https://github.blog/changelog/2022-08-09-github-actions-the-ubuntu-18-04-actions-runner-image-is-being-deprecated-and-will-be-removed-by-12-1-22/

agrenott commented 1 year ago

Could you please approve the CI workflow?

agrenott commented 1 year ago

Hi, could you please approve the CI workflow?

joto commented 1 year ago

@agrenott Please stop pestering the maintainers. It doesn't help. We are all busy people here and sometimes things need time and sometimes other things have priority.

agrenott commented 1 year ago

Hi @joto , I'm sorry I hurt your feelings. I'm just trying to contribute to an open source project, going further than what I needed to fix the existing code base, and you can imagine from my perspective waiting for more than a week for one of the maintainers to click on a button is frustrating as well. But fine, if you don't have time to follow up on this project I guess I'll simply fork it and save time and frustration to everyone...

joto commented 1 year ago

@agrenott Nobodies feelings got hurt. And I understand your frustration. But can you understand the frustration of the maintainers who want nothing more than to work on their projects all the time but can't? The scarcest ressource in all Open Source development is maintainer time and making more work for maintainers is simply not okay. This is not about the time to press a button. This is about the time taken away by getting a notification, switching in your mind to that project, figure out what to do, do it, switching to the next notification and so on.

And btw: You don't need to wait for the maintainer to approve this, you can change the Github action setup so that it runs in your fork on your account.

lonvia commented 1 year ago

Sorry if I gave the impression that my comment on numpy was optional. I won't take a PR where a function can only be used with numpy. It's fine if it can handle numpy arrays next to Python arrays but if only numpy arrays are supported, it's a no-no.

The CI-related stuff should go on a different PR to not be tied to the design discussion around add_locations().

agrenott commented 1 year ago

That's fine, I'll handle what I need on the fork. You may of course re-use whatever part you may find useful.

lonvia commented 1 year ago

I've cherry-picked the pybind fix to master now.