mesonbuild / meson

The Meson Build System
http://mesonbuild.com
Apache License 2.0
5.54k stars 1.61k forks source link

Cython support issue with generated pyx when touching cimported pxd: ninja needs to be run twice to fully build the project? #13212

Open lesteve opened 4 months ago

lesteve commented 4 months ago

Describe the bug When touching a pxd cimported in a generated Cython pyx files, ninja needs to be run twice to fully build the project. This was originally seen in scikit-learn see https://github.com/scikit-learn/scikit-learn/issues/28837. Seems like something maybe similar/related was reported in https://github.com/mesonbuild/meson-python/issues/589 where @dnicolodi investigated and mentioned a work-around. In our case we don't have generated pxi but generated pyx.

To Reproduce A simpler reproducer is available in https://github.com/lesteve/meson-partial-build

Here is a summary to give the gist of it

Package layout:

package
├── __init__.py
├── lib
│   ├── __init__.py
│   ├── lib.pyx
│   ├── lib.pyx.in
│   └── meson.build
├── meson.build
└── utils
    ├── __init__.py
    ├── meson.build
    └── utils.pxd

package/lib/meson.build

lib_cython_tree = [fs.copyfile('__init__.py')]

# In scikit-learn this would actually use tempita on a .pyx.tp file, but
# simpler to reproduce with only copyfile
lib_pyx = fs.copyfile('lib.pyx.in', 'lib.pyx')

py.extension_module(
  'lib',
  lib_pyx,
  lib_cython_tree,
  root_cython_tree,
  utils_cython_tree,
  cython_args: cython_args,
  install: true,
  subdir: 'package/lib',
)

package/utils/meson.build

utils_cython_tree = [
  fs.copyfile('__init__.py'),
  fs.copyfile('utils.pxd')
]

Initial Setup + build:

meson setup build
ninja -C build

First build:

touch package/utils/utils.pxd
# This does not rebuild package/lib/lib shared library
ninja -C build -d explain -v

Output:

ninja: Entering directory `build'
ninja explain: output meson-test-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-test-prereq is dirty
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja explain: restat of output package/utils/utils.pxd older than most recent input ../package/utils/utils.pxd (1713451993361700259 vs 1713452006018287455)
ninja explain: package/utils/utils.pxd is dirty
[1/1] /home/lesteve/micromamba/envs/scikit-learn-dev/bin/meson --internal copy ../package/utils/utils.pxd package/utils/utils.pxd

Second build still do things:

# The second time this builds package/lib/lib shared library
ninja -C build -d explain -v

Output:

ninja: Entering directory `build'
ninja explain: output meson-test-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-test-prereq is dirty
ninja explain: output meson-benchmark-prereq of phony edge with no inputs doesn't exist
ninja explain: meson-benchmark-prereq is dirty
ninja explain: restat of output package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/package/lib/lib.pyx.c older than most recent input /home/lesteve/dev/meson-partial-build/build/package/utils/utils.pxd (1713452000454989082 vs 1713452006018287455)
ninja explain: package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/package/lib/lib.pyx.c is dirty
ninja explain: package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/package/lib/lib.pyx.c is dirty
ninja explain: package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o is dirty
ninja explain: package/lib/lib.cpython-312-x86_64-linux-gnu.so is dirty
[1/3] cython -M --fast-fail -3 --include-dir /home/lesteve/dev/meson-partial-build/build package/lib/lib.pyx -o package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/package/lib/lib.pyx.c
[2/3] ccache cc -Ipackage/lib/lib.cpython-312-x86_64-linux-gnu.so.p -Ipackage/lib -I../package/lib -Ipackage -Ipackage/utils -I/home/lesteve/micromamba/envs/scikit-learn-dev/include/python3.12 -fvisibility=hidden -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -O2 -g -fPIC -MD -MQ package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o -MF package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o.d -o package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o -c package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/package/lib/lib.pyx.c
[3/3] cc  -o package/lib/lib.cpython-312-x86_64-linux-gnu.so package/lib/lib.cpython-312-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o -Wl,--as-needed -Wl,--allow-shlib-undefined -shared -fPIC

Expected behavior After touching utils.pxd the project is fully built in one go. Calling ninja again says "No work to do"

system parameters

dnicolodi commented 4 months ago

I am not sure whether the solution I described in https://github.com/mesonbuild/meson-python/issues/589 is a work-around or the intended way of coding the dependency. The solution is taken directly from the Meson's test for Cython support, thus I tend to think that it is the intended way for thing to work. Why do you think it is a work-around?

I have the impression that the issue reported here is due to the inclusion of a .py file in the sources but I need to look further into it.

dnicolodi commented 4 months ago

The __init__.py file in the test package linked in the issue is empty. Why is it passed along with the Cython source code to py.extension_module()?

lesteve commented 4 months ago

Thanks for taking a look!

Why is it passed along with the Cython source code to py.extension_module()?

This is needed for Cython cimports to work, you generally need __init__.py and cimported .pxd copied to the build directory, so that Cython detects this as a package and can cimport the relevant files. Maybe https://github.com/scikit-learn/scikit-learn/pull/28040#discussion_r1442942702 has a tiny bit more context about this point. You also need the utils __init__.py to be copied before lib is built and I believe adding the __init__.py and .pxd to the py.extension_module achieves this through an order-only dependency in the build.ninja.

I am not sure whether the solution I described in https://github.com/mesonbuild/meson-python/issues/589 is a work-around or the intended way of coding the dependency

I don't know either. If there was a similar work-around for my stand-alone example repo, it would be fine. At the same time, as a naive user, it feels quite convoluted to have to create a dependency from files rather than pass directly files to the py.extension_module but I still have plenty to learn about Meson, so I am probably missing some subtleties here ...

Edit: I should have mentioned that https://github.com/mesonbuild/meson-python/issues/589 may be slightly different and happens with generated .pxi where this issue happens with generated .pyx. Not a Cython expert, but I would say that .pxi are bit more weird/special than .pyx.

lesteve commented 4 months ago

cc @rgommers mostly for a FYI, not sure whether there is a better person to ping in the Meson team for Cython-specific things.

Summary: in some cases when updating a .pxd you need to run ninja twice to fully build the project. I have a standalone repo reproducing the issue https://github.com/lesteve/meson-partial-build. This was originally seen in scikit-learn https://github.com/scikit-learn/scikit-learn/issues/28837.

rgommers commented 4 months ago

The reason I think has to do with depfile support in Cython not knowing about files that are copied with fs.copyfile:

$ cd build/package/lib/lib.cpython-311-x86_64-linux-gnu.so.p/package/lib/
$ cat lib.pyx.c.dep
package/lib/lib.cpython-311-x86_64-linux-gnu.so.p/package/lib/lib.pyx.c: \
  /home/rgommers/code/tmp/meson-partial-build/build/package/utils/utils.pxd \
  package/lib/lib.pyx

That explains why it works when invoking ninja twice: the first time the copyfile is executed, and the second time around the build/package/utils/utils.pxd (which is tracked in the depfile) has changed, so now the extension module rebuilds.

If you're already running Tempita as a Meson generator, the easiest fix is probably to use the depends: keyword to that generator and make that depend on the .pxd files. Something like: https://github.com/scipy/scipy/blob/74c13e4406ae7a95b4509cb436e8fee5b34b2868/scipy/special/meson.build#L119-L122

it feels quite convoluted to have to create a dependency from files rather than pass directly files to the py.extension_module but I still have plenty to learn about Meson, so I am probably missing some subtleties here ...

Yes, we'd like it to work automatically whenever possible. It's already pretty solid after we added depfile support to Cython. But this templating use case is the exception I believe. I don't think it comes up for other languages, because there isn't a need to manually go copy header files around.

The missing link is in here somewhere (from build/build.ninja):

build package/utils/utils.pxd: CUSTOM_COMMAND ../package/utils/utils.pxd
 COMMAND = /home/rgommers/mambaforge/envs/scipy-dev/bin/meson --internal copy ../package/utils/utils.pxd package/utils/utils.pxd
 description = Copying$ file$ package/utils/utils.pxd

build package/lib/__init__.py: CUSTOM_COMMAND ../package/lib/__init__.py
 COMMAND = /home/rgommers/mambaforge/envs/scipy-dev/bin/meson --internal copy ../package/lib/__init__.py package/lib/__init__.py
 description = Copying$ file$ package/lib/__init__.py

build package/lib/lib.pyx: CUSTOM_COMMAND ../package/lib/lib.pyx.in
 COMMAND = /home/rgommers/mambaforge/envs/scipy-dev/bin/meson --internal copy ../package/lib/lib.pyx.in package/lib/lib.pyx
 description = Copying$ file$ package/lib/lib.pyx
build package/lib/lib.cpython-311-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o: c_COMPILER package/lib/lib.cpython-311-x86_64-linux-gnu.so.p/package/lib/lib.pyx.c || package/__init__.py package/lib/__init__.py package/utils/__init__.py package/utils/utils.pxd
 DEPFILE = package/lib/lib.cpython-311-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o.d
 DEPFILE_UNQUOTED = package/lib/lib.cpython-311-x86_64-linux-gnu.so.p/meson-generated_package_lib_lib.pyx.c.o.d

cc @rgommers mostly for a FYI, not sure whether there is a better person to ping in the Meson team for Cython-specific things.

Thanks for the ping. There's a few folks that try to be helpful with Cython stuff, I'm one of them. I think I have a decent feel for all the ways to shoot oneself in the foot with Cython's path-dependent behavior by now, so feel free to ping me indeed:)

eli-schwartz commented 4 months ago

package/utils/utils.pxd is a builddir relative path to a fs.copyfile() output, hardcoded as a header dependency rule in build.ninja

/home/rgommers/code/tmp/meson-partial-build/build/package/utils/utils.pxd is an absolute path to the same exact file, inferred via a depfile as a header dependency rule.

The first rule guarantees that fs.copyfile() is correctly run before cythoning that file. The issue is apparently that the depfile makes ninja think it now depends on another instance of that file, ninja doesn't dedupe and realize both are the same path, and ninja erroneously thinks the cythoned file is out of date?

This is either a ninja bug or a cython depfile support bug depending on how you look at it. Possibly it's a "both sides are buggy" bug.

lesteve commented 4 months ago

Thanks a lot for your inputs!

If you're already running Tempita as a Meson generator, the easiest fix is probably to use the depends: keyword to that generator and make that depend on the .pxd files. Something like: scipy/scipy@74c13e4/scipy/special/meson.build#L119-L122

I'll have a closer look at using a Meson generator, it looks like indeed a reasonable work-around.

rgommers commented 4 months ago

The issue is apparently that the depfile makes ninja think it now depends on another instance of that file, ninja doesn't dedupe and realize both are the same path, and ninja erroneously thinks the cythoned file is out of date?

Good points. I verified that if I edit the depfile by hand, changing the absolute path to a relative one, the rebuild does work as expected. So if we can avoid Cython writing absolute paths, things should start working as intended it looks like.

lesteve commented 4 months ago

FWIW, by looking a bit further, it looks like another work-around is to use the depends argument in the tempita custom_target

commit 3e13e9fff6bbefc9c68510dbe2653f4d79196d72 (HEAD -> fix, origin/fix)
Author: Loïc Estève <loic.esteve@ymail.com>
Date:   Thu May 30 14:34:29 2024 +0200

    Fix by adding depends to custom_target

diff --git a/package/lib/meson.build b/package/lib/meson.build
index 961ebc8..83651b5 100644
--- a/package/lib/meson.build
+++ b/package/lib/meson.build
@@ -5,14 +5,12 @@ lib_pyx = custom_target(
   output: 'lib.pyx',
   input: 'lib.pyx.in',
   command: [py, tempita, '@INPUT@', '-o', '@OUTDIR@'],
+  depends: [lib_cython_tree, root_cython_tree, utils_cython_tree]
 )

 py.extension_module(
   'lib',
   lib_pyx,
-  lib_cython_tree,
-  root_cython_tree,
-  utils_cython_tree,
   cython_args: cython_args,
   install: true,
   subdir: 'package/lib',

This is probably not as exact as the cython generator (the dependencies are actually indeed for the .pyx -> c step and not the pyx.in -> pyx step) but it seems to have the same result and it's probably a bit easier given how scikit-learn meson.build are currently written.

rgommers commented 4 months ago

@lesteve yes definitely, that is the right workaround for now. It also was what I meant with the suggestion in my first comment ("... the easiest fix is probably to use the depends: keyword to that generator and make that depend on the .pxd files ...").

The depfile discussion is the structural fix longer-term, but it needs a Cython fix and then a Cython release.

lesteve commented 3 weeks ago

FYI I have opened https://github.com/cython/cython/pull/6345 to use relative paths in Cython depfile.