rapidsai / cucim

cuCIM - RAPIDS GPU-accelerated image processing library
https://docs.rapids.ai/api/cucim/stable/
Apache License 2.0
359 stars 61 forks source link

Drop NumPy build dependency #751

Closed jakirkham closed 4 months ago

jakirkham commented 4 months ago

Partially addresses issue: https://github.com/rapidsai/build-planning/issues/82 Partially addresses issue: https://github.com/rapidsai/build-planning/issues/41

Even though cuCIM currently #includes <pybind11/numpy.h>, the actual C++ code appears not to use NumPy. So this attempts to drop the header and the NumPy build dependency.

jakirkham commented 4 months ago

Looks like this code might need some tweaks if we proceed further

https://github.com/rapidsai/cucim/blob/0e75c7676dc3dd818ae735d660d0fead88e523ba/python/pybind11/cucim_py.cpp#L448

Edit: Changes below rewrite this to use a memoryview instead

jakirkham commented 4 months ago

I have a question: Did pybind11/numpy.h depends on numpy header or numpy library?

A few things to unpack here

NumPy is atypical in its setup. When building against NumPy, one only #includes the NumPy header. There is not a NumPy library that one links against in the typical sense

However the symbols that the NumPy header names are in the Python shared objects that the NumPy package ships. Those symbols get loaded when calling import numpy (there is a similar operation that NumPy supplies for use in C APIs). So this is how the symbols get resolved at runtime

Regardless, from a developer's perspective, building against NumPy always means using the header and the libraries. There isn't a way to pick just one or the other

Using pybind11 for NumPy support is not unique in this regard

If this change is about for handling numpy 2, can upgrading pybind11 library to the latest version helps? Looks like pybind11 is handling numpy 2 case - https://github.com/pybind/pybind11/blob/master/include/pybind11/numpy.h#L187

Yes, it is true that pybind11 2.12.0 ships with NumPy 2 support. Building against that would be sufficient for NumPy 1 & 2 support (without other changes)

That said, there are relatively few cases where the NumPy API is strictly needed. Especially after the introduction of the Python Buffer Protocol. Many use cases (and ours in cuCIM is one of these) simply need a way to access the underlying memory buffer of Python objects (NumPy arrays or otherwise). So in these cases, it is better to use the Python Buffer Protocol directly (as this code change does), which works not only with NumPy arrays, but any object that supports the Python Buffer Protocol. As a result this simplifies our dependencies. Plus this approach is more flexible and interoperable with other libraries

gigony commented 4 months ago

I have a question: Did pybind11/numpy.h depends on numpy header or numpy library?

A few things to unpack here

NumPy is atypical in its setup. When building against NumPy, one only #includes the NumPy header. There is not a NumPy library that one links against in the typical sense

However the symbols that the NumPy header names are in the Python shared objects that the NumPy package ships. Those symbols get loaded when calling import numpy (there is a similar operation that NumPy supplies for use in C APIs). So this is how the symbols get resolved at runtime

Regardless, from a developer's perspective, building against NumPy always means using the header and the libraries. There isn't a way to pick just one or the other

Using pybind11 for NumPy support is not unique in this regard

If this change is about for handling numpy 2, can upgrading pybind11 library to the latest version helps? Looks like pybind11 is handling numpy 2 case - https://github.com/pybind/pybind11/blob/master/include/pybind11/numpy.h#L187

Yes, it is true that pybind11 2.12.0 ships with NumPy 2 support. Building against that would be sufficient for NumPy 1 & 2 support (without other changes)

That said, there are relatively few cases where the NumPy API is strictly needed. Especially after the introduction of the Python Buffer Protocol. Many use cases (and ours in cuCIM is one of these) simply need a way to access the underlying memory buffer of Python objects (NumPy arrays or otherwise). So in these cases, it is better to use the Python Buffer Protocol directly (as this code change does), which works not only with NumPy arrays, but any object that supports the Python Buffer Protocol. As a result this simplifies our dependencies. Plus this approach is more flexible and interoperable with other libraries

Thanks @jakirkham for the comprehensive explanation! It makes sense, and thank you for the update! 🙂

jakirkham commented 4 months ago

/merge

jakirkham commented 4 months ago

Thanks all! 🙏