Adding numpy as a dependency with custom lookup

rgommers commented 2 years ago

This is an issue to keep track of a feature I'd like to have: making dependency('numpy') work, similar to the examples in https://mesonbuild.com/Dependencies.html#dependencies-with-custom-lookup-functionality.

First, here is my current code for NumPy support in my SciPy meson branch - it works, but isn't pretty:

# NumPy include directory - needed in all submodules
incdir_numpy = run_command(py3,
  [
    '-c',
    'import os; os.chdir(".."); import numpy; print(numpy.get_include())'
  ],
  check: true
).stdout().strip()

inc_np = include_directories(incdir_numpy)

incdir_f2py = incdir_numpy / '..' / '..' / 'f2py' / 'src'
inc_f2py = include_directories(incdir_f2py)
fortranobject_c = incdir_f2py / 'fortranobject.c'

npymath_path = incdir_numpy / '..' / 'lib'
npymath_lib = meson.get_compiler('c').find_library('npymath', dirs: npymath_path)
npyrandom_path = incdir_numpy / '..' / '..' / 'random' / 'lib'
npyrandom_lib = meson.get_compiler('c').find_library('npyrandom', dirs: npyrandom_path)

Note that:

numpy.get_include() is currently the only correct way to obtain the main NumPy include directory (there's no solid pkg-config support - although there are some remnants, they are untested/unused AFAIK)
numpy.f2py.get_include() to do the same for F2PY include directories was added in numpy 1.21.1. The code above uses relative directories because it needs to support older NumPy versions.
libnpymath.a can be linked against, it's located at numpy/core/lib/libnpyrandom.a. It's quite old, can be assumed to always be present.
libnpyrandom can be linked against, it's located at numpy/random/lib/libnpyrandom.a. It was introduced in numpy 1.19.0 (2020).

I think, based on reading the Meson docs, this is the desired support:

numpy_dep = dependency('numpy')
# Can then be used to build extensions like so:
dependencies: [py3_dep, numpy_dep]

# For a Fortran dependency using f2py (which also needs the main numpy include dirs):
f2py_dep = dependency('numpy', modules: 'f2py')
# Can then be used to build extensions like so:
dependencies: [py3_dep, f2py_dep]  # doesn't need numpy_dep, will be auto-added (?)

npymath_lib = dependency('numpy', modules: 'npymath')
npyrandom_lib = dependency('numpy', modules: 'npyrandom')
# Can then be used to build extensions like so (doesn't need numpy_dep, will be auto-added (?)):
dependencies: [py3_dep, npymath_lib]
dependencies: [py3_dep, npyrandom_lib]

Note that f2py, libnpymath and libnpyrandom will always be present. I haven't used modules: before, so I'm not 100% sure that's the right thing here.

Cc @eli-schwartz, who offered to support/review this new feature.

rgommers commented 2 years ago

Also perhaps useful info, here are the docs for how CMake does it: https://cmake.org/cmake/help/latest/module/FindPython3.html#findpython3.

find_package (Python3 COMPONENTS Interpreter NumPy)
target_link_libraries(... Python3::NumPy)

Variables:

Python3_NumPy_FOUND
Python3_NumPy_INCLUDE_DIRS
Python3_NumPy_VERSION

Version number will indeed be useful. CMake doesn't seem to have support for libnpymath and libnpyrandom.

eli-schwartz commented 2 years ago

modules seems like the correct approach here.

BTW if numpy could add pkg-config files for at least the case where it is built from source and installed as a system package that would be super convenient, we could use the numpy.get_include stuff as a fallback. ;)

Custom lookup dependencies can have subdependencies, i.e. I think it makes sense that numpy should always have a recursive dep on python?

rgommers commented 2 years ago

BTW if numpy could add pkg-config files for at least the case where it is built from source and installed as a system package that would be super convenient, we could use the numpy.get_include stuff as a fallback. ;)

Yes, that does seem like a good idea. I have thought about it, but would prefer to do it only once NumPy is switching over to Meson itself. Otherwise I'll have to implement it twice, and adding new features to numpy.distutils is not a very enjoyable experience.

Custom lookup dependencies can have subdependencies, i.e. I think it makes sense that numpy should always have a recursive dep on python?

Yes, that makes sense to me - the NumPy C API and the C code that f2py generates both depend on the CPython C API.

rgommers commented 2 years ago

@eli-schwartz I have a WIP branch here and could use a few pointers: https://github.com/mesonbuild/meson/compare/master...rgommers:meson:numpy-dependency?expand=1

Does that look in the right direction, or should it go into modules/python.py? How do I get at the python executable found by dependency('python')?

eli-schwartz commented 2 years ago

How do I get at the python executable found by dependency('python')?

The python module's .dependency() method is a bit different from the one that is a global dependency, for instance it doesn't allow specifying which python executable to use.

This actually plays into the issue of where to put it. There's two implementations of a python dependency that aren't necessarily in sync, and we should consolidate them in, say, dependencies/python.py. I wonder, can we sneak a PythonExternalProgram in as an optional value...

rgommers commented 2 years ago

This actually plays into the issue of where to put it. There's two implementations of a python dependency that aren't necessarily in sync

Aside from python vs. python3? The latter can probably be removed by now? There's also what looks like a leftover thing in dependencies/misc.py.

dcbaker commented 2 years ago

I don’t know if it ever landed, but I had code that made them all the same with a wrapper around the misc implantation (moved to a new module)

eli-schwartz commented 2 years ago

I'm pretty sure it did not land, because it's currently still a mess.

But this is what I've been working on today: https://github.com/eli-schwartz/meson/commits/python-dependency-refactor

The third commit is surely broken... it's half-finished work.

ahesford commented 2 years ago

I was bitten this weekend by the numpy detection logic when trying to move Void Linux from SciPy 1.8.1 to SciPy 1.9.0 with Meson. I don't have much to add from my experience:

Manually prepending a cross-toolchain build prefix (in our case, /usr/<triple> to all of the paths returned by numpy.get_include() still leads find_library to find the native npymath (and, I presume, npyrandom) rather than the right version in the build prefix. I didn't dig too deeply into this.
Detecting the Python runtime may be problematic because we're trying to find the host Python, possibly with cross accommodations like custom PYTHONPATH and sysconfig data.
I tried forcing proper library detection using the existing logic by omitting NumPy from the host and installing it only in the build root; this would work for the pure-Python pybind11 but cannot work for NumPy because the build version of NumPy may not be importable by the host Python. (During import numpy, it wants to load shared objects that might be for the wrong architecture.)

I was just going to file an issue tagging @eli-schwartz for advice. As he's already thinking about this problem, I'll just say "me too" and watch what happens. We're sticking with the distutils build in Void for now.

eli-schwartz commented 2 years ago

It's not entirely clear to me how this works... You have the build machine python but with the cross-compile numpy and you want to build a cross-compile SciPy? If you were using the cross-compile python then I'd be hopeful that this works fine.

Of course, not having to run cross-compile tools via qemu-user is the exact reason why pkg-config is so much better than config-tool stuff (shakes fist at llvm-config) so there is that...

ahesford commented 2 years ago

In Void, we don't run the build Python when cross-compiling native Python extensions. Instead, we use the host Python but override the sysconfig and set custom prefix and path variables to allow the Python packaging tools to gather information (field sizes, shlib suffix, etc.) about the build platform rather than the host platform. This "mostly" works for setuptools/distutils although, for the very issues we're seeing with the move to meson in SciPy, we hack around some search paths when building things that link with NumPy. Yes, this can be fragile, but it's Python packaging...

Void also installs wrappers for a lot of tools (pkg-config comes to mind) ahead of the default system path to make sure that we are searching the right paths for build dependencies.

rgommers commented 2 years ago

Instead, we use the host Python but override the sysconfig and set custom prefix and path variables to allow the Python packaging tools to gather information (field sizes, shlib suffix, etc.) about the build platform rather than the host platform.

Yes, that's clearly not going to work with the way I currently implemented numpy dependency detection inside SciPy. When we have a builtin numpy dependency in Meson, it should work without having to run any Python code (host or build); the relevant numpy paths are deterministic given the host platform site-packages location. I could probably even get rid of that numpy.get_include() usage now, this is just the first actual bug report that motivates doing so.

@ahesford part of the things you have in your bullet list above need updating for Meson I think. One of the benefits of moving to Meson is proper cross-compiling support via a cross build definition file (https://mesonbuild.com/Cross-compilation.html). I'd be quite interested in making this work in SciPy itself (xref https://github.com/scipy/scipy/issues/14812) and having a SciPy CI job which cross-compiles (e.g. a two-stage GitHub Actions job, x86-64 to aarch64 build, then run basic tests in an aarch64 container - or whichever other platform combo makes sense). Would you want to collaborate on that? Or if not, can you open a SciPy issue with as much detail as possible (a lot of which you have above already) so I can add it to our tracking issue?

ahesford commented 2 years ago

@ahesford part of the things you have in your bullet list above need updating for Meson I think. One of the benefits of moving to Meson is proper cross-compiling support via a cross build definition file (https://mesonbuild.com/Cross-compilation.html). I'd be quite interested in making this work in SciPy itself (xref scipy/scipy#14812) and having a SciPy CI job which cross-compiles (e.g. a two-stage GitHub Actions job, x86-64 to aarch64 build, then run basic tests in an aarch64 container - or whichever other platform combo makes sense). Would you want to collaborate on that? Or if not, can you open a SciPy issue with as much detail as possible (a lot of which you have above already) so I can add it to our tracking issue?

Sorry for the delayed response. I've opened https://github.com/scipy/scipy/issues/16783 to document SciPy specifics and get people thinking about possible resolutions. I'm happy to collaborate on a solution; SciPy and NumPy are critical tools for me (which is why I maintain the major parts of the Python numeric stack for Void) and my increasing familiarity with meson convinces me that it's more pleasant than pretty much all other build systems.

rgommers commented 1 year ago

In my initial issue description here, I missed that python.find_installation already has a modules keyword. The release note for that feature contains:

py = import('python').find_installation('python3', modules : ['numpy'])

Making numpy a separate dependency is kind of a pain, because we then don't know to which Python interpreter it belongs - which starts to matter very quickly when cross-compiling. We may need to run numpy.f2py as a code generator, so that would need it from the native Python (see https://github.com/scipy/scipy/blob/main/tools/generate_f2pymod.py#L278-L283 for SciPy) and we also need its include dir from the host Python. While for numpy.get_include() to get at the headers we need to use the NumPy C API, we need it from the cross Python. See https://github.com/scipy/scipy/blob/main/scipy/meson.build#L39-L84

crossenv appears to have a bunch of hacks to work around the issues, but it seems to me like what we really need is to look for two Python interpreters in general. For SciPy in particular:

py_mod = import('python')  # we just have one of these ....
# the host Python 
py = py_mod.find_installation(
  modules : ['numpy', 'pybind11'],
  pure: false)                    
py_dep = py3.dependency()

# We need the native Python from the build machine to run som code generators:
py_native = py_mod.find_installation(
  native: true,  # FIXME: this keyword does not exist
  modules: ['numpy', 'pythran'],  # code generation tools
  pure: false,
)

Looking at https://mesonbuild.com/Python-module.html, there is no way to actually ask for the native Python during a cross build. It's not clear to me if this is the best way, or this is supposed to work differently. @eli-schwartz WDYT?

eli-schwartz commented 1 year ago

I assume the code generation tools can run on either the native or the cross compile python, and still produce the same outputs.

Do you need to guarantee that exactly the same version of each is used? Do the versions not particularly matter? Do they matter, but only a minimum version is necessary, not a maximum version?

rgommers commented 1 year ago

I assume the code generation tools can run on either the native or the cross compile python, and still produce the same outputs.

Yes, I think that is true. Except that one doesn't want to require QEMU to actually run them, hence the desire to allow specifying that they come from the native Python.

Do you need to guarantee that exactly the same version of each is used? Do the versions not particularly matter? Do they matter, but only a minimum version is necessary, not a maximum version?

Not always needed to use the same version, but it should be possible. However, I think that's up to the user to set up the build environment - probably no need for Meson to enforce that. I think that's the answer for all of these. meson-python already uses a native file to control the exact Python interpreter selection. I'm not sure anything is needed beyond that.

rgommers commented 1 year ago

https://conda-forge.org/docs/maintainer/knowledge_base.html#cross-compilation already has explicit the two sets of dependencies when cross-compiling, and distinguishes between python as a build-only dependency (for code generators I guess) and a runtime dependency: https://conda-forge.org/docs/maintainer/knowledge_base.html#build-matrices.

eli-schwartz commented 1 year ago

In that case, I wonder if it makes sense to just use find_program('pythran', native: true) and find_program('f2py', native: true)?

Inside of build isolation, the $PATH is now set up to actually find those properly, which required a fix in pip that @dnicolodi made. This is important for meson-python for finding Meson itself, although only if Meson isn't pre-installed at the OS level before performing a build.

(People doing cross builds are probably not using build isolation 😜 and $PATH is generally guaranteed to have native executables. But you can also use the machine file to define those executables too, of course.)

dnicolodi commented 1 year ago

I'm not sure I completely understand what this issue is proposing, however, I think that there must be a clear distinction between code generation tools and libraries. For code generation tools, find_program('tool') needs to work. As these are code generation tools, for cross compilation it does not make much sense to have these tools defined in the cross file, thus native: true should not even be necessary. If it does not it is an issue on how the tool is deployed on the build system or a more general issue with the tool itself. Working around it in Meson does not seem to be a good or sustainable strategy. I think that cython, pythran and f2py all work in this way (although they need to be installed for the host Python).

Libraries are another matter. Unfortunately, AFAIK there isn't standardized a way for Python packages to expose this information. What NumPy does is not optimal because it requires to execute Python code, which complicates cross compilation (in general the cross compilation story for Python packages is not great). One possible solution to investigate could be to add the required information to the wheel metadata (IIUC the wheel metadata can be extended beyond what is defined in the PEPs). Doing would possibly open the door to read the metadata fields without running target system code.

dnicolodi commented 1 year ago

incdir_numpy = run_command(py3,
  [
    '-c',
    'import os; os.chdir(".."); import numpy; print(numpy.get_include())'
  ],
  check: true
).stdout().strip()

inc_np = include_directories(incdir_numpy)

I've seen this code replicated in a few projects, but it is problematic. First, it requires the NumPy header directory to be a subdirectory of the Meson project, which is everything but guaranteed. I think

np_dep = declare_dependency(includes: incdir_numpy)

is the way to go. Second, even in the case the NumPy includes are installed in a subdirectory of the Meson project, the code executed to get the header needs to return a relative path, and it does not, at least on my system. This code does:

incdir_numpy = run_command(py3,
  [
    '-c',
    'import os, numpy; print(os.path.relpath(numpy.get_include()))'
  ],
  check: true
).stdout().strip()

eli-schwartz commented 1 year ago

I've seen this code replicated in a few projects, but it is problematic. First, it requires the NumPy header directory to be a subdirectory of the Meson project, which is everything but guaranteed. I think

The chdir definitely doesn't say anything about it being or not being a subdirectory of the meson project. I think it's an attempt to evade python's default PYTHONPATH allowing the project source code to be "accidentally imported" when you don't want that.

As for relative vs. absolute, you're "supposed to" have your virtualenv with numpy installed be somewhere other than inside your source tree. Because get_include() returns an absolute path, and that's completely okay if that path is, say, /usr/lib/python3.10/site-packages/numpy/core/include

eli-schwartz commented 1 year ago

Aside: IMO the proper solution for pybind11 is to handle it like a config-tool, and I actually have that all planned out as soon as my python dependency refactor PR is merged.

dnicolodi commented 1 year ago

The chdir definitely doesn't say anything about it being or not being a subdirectory of the meson project. I think it's an attempt to evade python's default PYTHONPATH allowing the project source code to be "accidentally imported" when you don't want that.

I'm not sure this is the case: it would matter only inside a project that has a numpy top level directory that is also a Python package, and this is very unlikely to be the case for anything else than NumPy itself. But it is a bit silly for NumPy to look up its own headers importing numpy. I've no idea what the chdir() is trying to accomplish there.

As for relative vs. absolute, you're "supposed to" have your virtualenv with numpy installed be somewhere other than inside your source tree. Because get_include() returns an absolute path, and that's completely okay if that path is, say, /usr/lib/python3.10/site-packages/numpy/core/include

Right (I find the error message Meson spits out when the path is absolute but the directory inside the source tree always confusing). But using declare_dependency() the requirement for the include path to be relative if inside the source tree and absolute when outside is relaxed, thus there is no requirement on how the virtualenv ere set up. I always have the virtual env for a project instantiated inside the project source directory. Imposing a specific layout for the virtual envs is a bit annoying.

rgommers commented 1 year ago

For absolute vs. relative path, please see https://github.com/scipy/scipy/issues/16312. Neither is great, but as it stands absolute paths are preferred. I suggest keeping the discussion on that particular code construct on that SciPy issue.

I'm not sure I completely understand what this issue is proposing, however, I think that there must be a clear distinction between code generation tools and libraries. For code generation tools, find_program('tool') needs to work.

Okay thanks, I think this sounds good. You are both saying the same thing here - all code generation tools should be using find_program, so there's no need to get explicitly at the native interpreter within meson.build files.

We still have to choose between:

py = import('python').find_installation('python', modules : ['numpy'])

and

numpy_dep = dependency('numpy')
npymath_lib = dependency('numpy', modules: 'npymath')
npyrandom_lib = dependency('numpy', modules: 'npyrandom')

If the former is preferred, that still leaves us with how to get at the include dirs without running the non-native interpreter. Maybe this will work:

py = import('python').find_installation('python', modules : ['numpy'])
py_dep = py.dependency()
numpy_incdir = py.get_path('platlib') / 'numpy' / 'core' / 'include'

but that's still very constraining, and hardcodes paths that the user ideally wouldn't know about. I'm not sure if that can be improved upon. dependency('numpy') is a lot more flexible - I just had the implementation issue that the dependency provider for numpy should get at the detected Python install from import('python').find_installation.

rgommers commented 1 year ago

Libraries are another matter. Unfortunately, AFAIK there isn't standardized a way for Python packages to expose this information.

Indeed.

What NumPy does is not optimal because it requires to execute Python code, which complicates cross compilation (in general the cross compilation story for Python packages is not great).

100% agreed, but I'll note that this issue aims to sidestep that requirement. I know that numpy.get_include() returns path-to-numpy/core/include, so I want to encode that knowledge into Meson itself so there is no longer a need to either run the interpreter or to hardcode the path in packages that depend on numpy.

One possible solution to investigate could be to add the required information to the wheel metadata (IIUC the wheel metadata can be extended beyond what is defined in the PEPs). Doing would possibly open the door to read the metadata fields without running target system code.

That'd be a nice enhancement in the future - but not required for this issue.

dnicolodi commented 1 year ago

I don't think that my proposal for how to solve absolute vs relative include path passed to include_directory() vs using declare_dependency() is unrelated to the topic of this issue. You are proposing to have dependency('numpy') be a dependency with a custom lookup functionality. This can only return the same type of object that declare_dependency() gives you. If dependency('numpy') is a solution,declare_dependency(includes: numpy_incdir)` is at least going in the right direction.

I just had the implementation issue that the dependency provider for numpy should get at the detected Python install from import('python').find_installation.

I don't think this is going to fly as there can be more than one Python installation obtained from the Meson's python module (ie a python2 and a python3 installations). The only other interface I can think about that may solve the issue is:

py = import('python').find_installatio()
python_with_numpy_dep = py.dependency(modules: 'numpy')

where a modules argument is added to the python_installation object dependency() method. When the modules argument is specified, the dependency object returned is augmented with the information relative to the specified modules. This could also allow:

python_with_numpy_and_libnpmath = py.dependency(modules: 'numpy.core.npmath')

or something similar.

However, this seems like al lot of work (and a lot of not really immediate interfaces to document and remember) just to avoid users to have to spell

py.get_path('platlib') / 'numpy' / 'core' / 'include'

rgommers commented 1 year ago

If dependency('numpy') is a solution,declare_dependency(includes: numpy_incdir)` is at least going in the right direction.

I will give this another try. I think I tried all permutations at the time, but maybe it will work with relative paths. I'll note that the chdir is to avoid having scipy's signal and io submodules shadowing the stdlib modules of the same name.

However, this seems like al lot of work (and a lot of not really immediate interfaces to document and remember) just to avoid users to have to spell
py.get_path('platlib') / 'numpy' / 'core' / 'include'

I'll note that there's a whole mess here that I didn't get into, with Python packages possibly being installed not under the regular site-packages (/ platlib) but in the user dir or another dir on sys.path. When running Python code inside Meson, one can handle all that and check that the returned numpy/core/include directory actually exists.

rgommers commented 1 year ago

I will give this another try.

Done in https://github.com/scipy/scipy/pull/18006. It's not pretty, but it seems to work.

sampotter commented 1 year ago

I can't say I follow all the inside baseball in this thread, but I thought I'd ask you experts what the current recommended approach to using NumPy's Array API from meson is.

Specifically, I'm working on using meson-python to build a Python wrapper for a C library. I want to use the Array API to percolate some stuff back up from C to Python, hence need to #include <numpy/arrayobject.h>.

My hope was that I'd be able to numpy_dep = dependency('numpy'), but it appears that something like this feature is still in the works.

rgommers commented 1 year ago

@sampotter yes dependency('numpy') is still in the works. For now I recommend to do what SciPy does: https://github.com/scipy/scipy/blob/main/scipy/meson.build#L30-L73

ashwinvis commented 7 months ago

It seems to me that #12799 fixed this with support for numpy-config

rgommers commented 7 months ago

Good point, I forgot to link this issue. As commented on gh-12799, it'd be nice to also make things work for older NumPy versions though, and then close this issue.

mesonbuild / meson

Adding numpy as a dependency with custom lookup #9598