mesonbuild / wrapdb

New wrap requests
https://mesonbuild.com/Adding-new-projects-to-wrapdb.html
MIT License
69 stars 174 forks source link

Add nanobind #1556

Open WillAyd opened 2 weeks ago

WillAyd commented 2 weeks ago

cc @wjakob

eli-schwartz commented 2 weeks ago

IIRC releases.json has to be in alphabetical order, nanobind is right before nanoarrow but should instead be right after.

WillAyd commented 2 weeks ago

That makes sense. Looks like some of the build failures have to do with the robin-map depdendency version as well - may need to update that as a precursor

wjakob commented 2 weeks ago

@eli-schwartz @WillAyd : If I understand correctly, the goal of this PR is to build using pure Meson without depending on CMake? This would be awesome, but the recipe here looks somehow too easy to me.

Please take a look at some of the steps explained in the comment at the top of https://github.com/wjakob/nanobind/blob/master/src/nb_combined.cpp. And this is the "ground truth" reference for the extension build process: https://github.com/wjakob/nanobind/blob/master/cmake/nanobind-config.cmake

For example, does this wrapper

  1. Avoid linking against a specific libpython.so/dylib? Basically, we don't want to bake a libpython path into the compiled shared library—the interpreter loading the extension will already have all of the libpython symbols.
  2. Compute the extension suffix suitable for the desired Python ABI and stable/non-stable convention?
  3. Set compilation flags required for correctness (-fno-strict-aliasing) in the static library part only?
  4. Support stable ABI builds?
  5. Set flags that are a good idea to obtain small binaries when making release builds ('-DNDEBUG' and '-DNB_COMPACT_ASSERTIONS', '-ffunction-sections/-fdata-sections/--gc-sections', various stripping flags -- see nanobind's CMake build system for details on these steps)
  6. Use the chained fixups optimization for library builds on macOS? Nanobind ships a custom linker response file to enable this.
  7. Support isolating extensions from others by specifying the NB_DOMAIN parameter? This is in general something that almost every extension should do to avoid cross-contamination.

Quite a lot of work has gone into the CMake interface produce nice output—I would be concerned about alternatives that produce subpar extensions by missing some of these steps. A good recipe that helped to reproduce the behavior in the Bazel build system (https://github.com/nicholasjng/nanobind-bazel) has been to run nanobind's CMake build sytem on a simple extension in verbose mode and then try to exactly reproduce all of the compiler and linker invocations with their flags on macOS+Windows+Linux (some subtle differences there as well).

I'm happy to provide further detail on anything with the caveat that I'll be traveling in July.

eli-schwartz commented 2 weeks ago

For example, does this wrapper

  1. Avoid linking against a specific libpython.so/dylib? Basically, we don't want to bake a libpython path into the compiled shared library—the interpreter loading the extension will already have all of the libpython symbols.
  2. Compute the extension suffix suitable for the desired Python ABI and stable/non-stable convention?

These questions are related.

That's something every python module already has to handle. Setuptools handles this automatically, as does maturin. Meson handles it automatically too! CMake doesn't support it, which is why scikit-build and I assume nanobind as well, provide their own cmake modules that define functions which override cmake add_library.

Meson calls this pymod.extension_module().

  1. Set compilation flags required for correctness (-fno-strict-aliasing) in the static library part only?

This is a legacy of python 2 which violates the C/C++ language standards. It's not valid code if it has to compile with disabling the mandatory aliasing rules of the C++ language.

Since many people get aliasing wrong even though it breaks the language (and so did CPython 2.7!) -- and because sometimes the fact that your code breaks the requirements is difficult to spot -- compiler vendors offer a compiler extension that tells the compiler to cripple its codegen and produce really slow code that tiptoes around aliasing, adding redundant checks here there and everywhere to check at runtime whether the code actually followed the aliasing rules, and recover if it didn't follow the rules.

Aliasing violations can't come from CPython since CPython fixed that. They can still come from projects' own code, or from their nanobind dependency.

I think you're saying that nanobind needs this extra protection as the cpp_args kwarg to the library() call for nanobind itself? That can be added.

  1. Support stable ABI builds?

Meson handles this automatically as part of pymod.extension_module(). If nanobind needs to be built with the limited API define then it should be manually calculated based on the -Dpython.allow_limited_api=true builtin option, as meson doesn't handle this for static libraries that use CPython and get linked into extension modules.

It may make sense to extend the meson language to handle static libraries here somehow. I'll think about this...

  1. Set flags that are a good idea to obtain small binaries when making release builds ('-DNDEBUG' and '-DNB_COMPACT_ASSERTIONS', '-ffunction-sections/-fdata-sections/--gc-sections', various stripping flags -- see nanobind's CMake build system for details on these steps)

This can be added to the declare_dependency() function call in this PR. Meson does have a builtin -Db_ndebug=true option which should be respected, though.

  1. Use the chained fixups optimization for library builds on macOS? Nanobind ships a custom linker response file to enable this.
  2. Support isolating extensions from others by specifying the NB_DOMAIN parameter? This is in general something that almost every extension should do to avoid cross-contamination.

Those are interesting points. The linker response file can probably be handled just like point 5. I'm not sure how to handle point 7 (users could I suppose add that to their own extensions).

Quite a lot of work has gone into the CMake interface produce nice output—I would be concerned about alternatives that produce subpar extensions by missing some of these steps. A good recipe that helped to reproduce the behavior in the Bazel build system (https://github.com/nicholasjng/nanobind-bazel) has been to run nanobind's CMake build sytem on a simple extension in verbose mode and then try to exactly reproduce all of the compiler and linker invocations with their flags on macOS+Windows+Linux (some subtle differences there as well).

I'm happy to provide further detail on anything with the caveat that I'll be traveling in July.

Thanks for your insights!

WillAyd commented 2 weeks ago

9. Set flags that are a good idea to obtain small binaries when making release builds ('-DNDEBUG' and '-DNB_COMPACT_ASSERTIONS', '-ffunction-sections/-fdata-sections/--gc-sections', various stripping flags -- see nanobind's CMake build system for details on these steps)

This can be added to the declare_dependency() function call in this PR. Meson does have a builtin -Db_ndebug=true option which should be respected, though.

Does Meson not set most of these in the first place for a release build? I added -DNB_COMPACT_ASSERTIONS but from previous testing the rest were automatically added - so I don't think need to try and manually add here?

WillAyd commented 2 weeks ago

Ignore my previous comment - those flags probably came from the CMake wrap I tried before. Will take another look

wjakob commented 2 weeks ago

Two quick responses to @eli-schwartz:

CMake doesn't support it, which is why scikit-build and I assume nanobind as well, provide their own cmake modules that define functions which override cmake add_library.

Incorrect -- CMake handles it too (via Python_add_library). But the way that CMake does this on macOS (by specifying the legacy -undefined dynamic_lookup linker flag) is incompatible with the recently added chained fixups linker optimization, so nanobind uses a custom solution. There is lots of discussion about it here.

Aliasing violations can't come from CPython since CPython fixed that.

To my knowledge (and I would be really curious to be pointed to something saying otherwise) Core CPython C code to the cannot be compiled as regular C++ code. It heavily relies on macros that cast and reinterpret data structures that aren't related to each other through inheritance, which violates C++ type punning rules. The -fno-strict-aliasing is therefore required when compiling CPython-style C code in a C++ context. The reason why nanobind specifies this flag to the "libnanobind" utility library component is because it is written like regular CPython code, making plentiful use of such data structure reinterpretation that would be suspect in C++. But this flag isn't imposed on the remainder of the (user-provided) bindings, so aliasing analysis remains useful there.

thesamesam commented 2 weeks ago

To my knowledge (and I would be really curious to be pointed to something saying otherwise) Core CPython C code to the cannot be compiled as regular C++ code. It heavily relies on macros that cast and reinterpret data structures that aren't related to each other through inheritance, which violates C++ type punning rules.

I'm not sure about this. C of course has its own type punning rules. CPython 2.x did have an infamous problem with it (https://peps.python.org/pep-3123/). I'm not aware of any issue in modern CPython.

numpy also has a bunch of C++ which is built without -fno-strict-aliasing (discussed at https://github.com/numpy/numpy/issues/25004) and nothing has gone wrong there.

Further, I'm not aware of any construct used in CPython which is legal under C aliasing rules but suddenly illegal under C++ rules. Could you point me to it?

SoapGentoo commented 2 weeks ago

@wjakob please link where in the Python documentation C++ compilation requires -fno-strict-aliasing. C and C++ literally have pretty much the exact same strict aliasing rules. Specifically, the rules that PEP 3123 relies are the same in C++, so your claims require actual evidence.

eli-schwartz commented 1 week ago

I would use the string method .version_compare('<3.12')

WillAyd commented 1 week ago

Any guidance on how to handle the MSVC errors about a debug build with a release Python interpreter?

nanobind| WARNING: Using a debug build type with MSVC or an MSVC-compatible compiler nanobind| when the Python interpreter is not also a debug build will almost nanobind| certainly result in a failed build. Prefer using a release build nanobind| type or a debug Python interpreter.

Should I just force a release build on msvc?

eli-schwartz commented 1 week ago

Setting the release buildtype in ci_config.json is definitely a reasonable way to deal with this.

eli-schwartz commented 1 week ago

The alpine failure is probably because alpine uses -dev packages just like Ubuntu.

WillAyd commented 1 week ago

Have a feeling the VisualStudio CI builds need something else besides just Python installed for the import to work. I couldn't find a -dev package in choco - any ideas?

benoit-pierre commented 1 week ago

After disabling fail-fast for the VisualStudio matrix (cf. #1561), the x64 job fails too.

WillAyd commented 1 week ago

Seems like the current VisualStudio x64 failure is reported upstream and a bug with the latest msvc compiler version

https://github.com/wjakob/nanobind/issues/613

benoit-pierre commented 1 week ago

And for the x32 job: https://github.com/mesonbuild/wrapdb/pull/1562. With that get similar errors.

WillAyd commented 1 week ago

Thanks @benoit-pierre . My assumption is those two jobs will continue to fail until a newer msvc release comes out (or we can try to test against an older compiler version)