mesonbuild / meson

The Meson Build System
http://mesonbuild.com
Apache License 2.0
5.33k stars 1.53k forks source link

1st class Cython language support #8706

Closed dcbaker closed 2 years ago

dcbaker commented 3 years ago

Okay, I stopped thinking about it and started writing code. This is not complete, there's a fair number of things not done (listed below). Currently, this is just enough to build one trivial test case, but I think it touches all the places that need to be changed, even if there is more work to do. I'm not sure how much more time I'll be able to spend on this, but here's a starting point if someone else wants to take it up.

TODO:

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging 33e06a7c4b27660db4d5b2a1bad7d9d6f5aaee56 into a493c950a73c5924e1e2f9454a5c13d0c7bc93bc - view on LGTM.com

new alerts:

dcbaker commented 3 years ago

For C++ I'm think using a -Dcython_language=(c|cpp) might be best, then if you want to override it, you can do something like:

cpp_files = static_library(
  'static_library
  ['foo.pyx'],
  override_options : ['cython_language=cpp'],
)

python.extension_module(..., link_with : static_lib)
lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging a5bb3ebcd42032a4b4876a5da8b9f4f3fed1abd3 into a493c950a73c5924e1e2f9454a5c13d0c7bc93bc - view on LGTM.com

new alerts:

rgommers commented 3 years ago

I've been testing this PR in https://github.com/rgommers/scipy/tree/meson, which starts porting SciPy submodules to Meson. So far everything works as expected.

dcbaker commented 3 years ago

I think we'll need to solve https://github.com/mesonbuild/meson/issues/8725 to completely make cython a 1st class language, but I'm glad this is step in the right direction.

jpakkane commented 3 years ago

How useful would this be in the main tree without the generated code bits? I.e. could that be dropped from this MR's scope and added afterwards?

dcbaker commented 3 years ago

I could fix the generated code bits to at least be on par with Rust pretty easily, ie you could have a purely generated or purely static build, but not a mix. I would be happy to land this at that point, and we can finish the structured input stuff separately. Let me update this today.

dcbaker commented 3 years ago

This is now based on #8735

dcbaker commented 3 years ago

So, the only things that don't work currently that probably should are: 1) we always output C not C++ 2) we always compile in python3 mode, with no way to get python2 mode.

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging ef5327f3d51e478cd5710da739f7e8f59f7b779e into 4ca9a16288f51cce99624a2ef595d879acdc02d8 - view on LGTM.com

new alerts:

scoder commented 3 years ago

So, the only things that don't work currently that probably should are:

  1. we always output C not C++

This is something that the directive # distutils: language=c++ would solve, which the cythonize frontend picks up. The cython frontend is intentionally dumb and requires this option on the command line. (Although there is a bit of room for debate on this difference.)

2. we always compile in python3 mode, with no way to get python2 mode.

The can always be changed per-file with the language_level directive. (If I understand correctly that that's what you are referring to here.)

dcbaker commented 3 years ago

I've added the an option for setting the python target (2 vs 3). I've called it cython_language, but I'm opened to painting the bike some other color.

dcbaker commented 3 years ago

For C vs C++ I have something that works, but I have some annoying corners to sort out.

dcbaker commented 3 years ago

@rgommers, for y'all how important is C++ support? I.e, can we land support for cython without it and get back to that later?

rgommers commented 3 years ago

@rgommers, for y'all how important is C++ support? I.e, can we land support for cython without it and get back to that later?

It's a small percentage of all files, so seems fine to land without it - we can still use custom_target for the .pyx files that target C++.

Here is the list of files containing "distutils: language=c++". Note that most are in .pxd files, which we don't need special handling for (they simply get used when you run cython --cplus on some .pyx file).

``` $ rg "distutils: language" scipy/stats/_qmc_cy.pyx 6:# distutils: language = c++ scipy/stats/biasedurn.pyx.templ 1:# distutils: language = c++ scipy/spatial/ckdtree.pyx 7:# distutils: language = c++ scipy/stats/_boost/include/code_gen.py 60: # distutils: language = c++ scipy/stats/_boost/include/templated_pyufunc.pxd 1:# distutils: language = c++ scipy/special/_generate_pyx.py 1329: f.write("\n# distutils: language = c++\n") scipy/optimize/_highs/cython/src/HighsLp.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/_highs_wrapper.pyx 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HConst.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HighsModelUtils.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HighsOptions.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/Highs.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HighsIO.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/highs_c_api.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HighsStatus.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/_highs_constants.pyx 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/SimplexConst.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HighsRuntimeOptions.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HighsLpUtils.pxd 1:# distutils: language=c++ scipy/optimize/_highs/cython/src/HighsInfo.pxd 1:# distutils: language=c++ ```

Here is the list of all our Cython files:

``` $ find . -type f -name "*.pyx" ./cluster/_vq.pyx ./cluster/_optimal_leaf_ordering.pyx ./cluster/_hierarchy.pyx ./ndimage/src/_cytest.pyx ./ndimage/src/_ni_label.pyx ./linalg/cython_lapack.pyx ./linalg/_matfuncs_sqrtm_triu.pyx ./linalg/_decomp_update.pyx ./linalg/_solve_toeplitz.pyx ./linalg/cython_blas.pyx ./optimize/_trlib/_trlib.pyx ./optimize/_group_columns.pyx ./optimize/_highs/cython/src/_highs_constants.pyx ./optimize/_highs/cython/src/_highs_wrapper.pyx ./optimize/_lsq/givens_elimination.pyx ./optimize/cython_optimize/_zeros.pyx ./optimize/_bglu_dense.pyx ./io/matlab/mio_utils.pyx ./io/matlab/streams.pyx ./io/matlab/mio5_utils.pyx ./_lib/_ccallback_c.pyx ./_lib/_test_deprecation_def.pyx ./_lib/_test_deprecation_call.pyx ./_lib/messagestream.pyx ./special/_ufuncs_cxx.pyx ./special/cython_special.pyx ./special/_ellip_harm_2.pyx ./special/_comb.pyx ./special/_test_round.pyx ./special/_ufuncs.pyx ./fftpack/convolve.pyx ./interpolate/interpnd.pyx ./interpolate/_bspl.pyx ./interpolate/_ppoly.pyx ./sparse/csgraph/_shortest_path.pyx ./sparse/csgraph/_traversal.pyx ./sparse/csgraph/_flow.pyx ./sparse/csgraph/_tools.pyx ./sparse/csgraph/_matching.pyx ./sparse/csgraph/_reordering.pyx ./sparse/csgraph/_min_spanning_tree.pyx ./sparse/_csparsetools.pyx ./spatial/ckdtree.pyx ./spatial/_voronoi.pyx ./spatial/_hausdorff.pyx ./spatial/transform/rotation.pyx ./spatial/qhull.pyx ./signal/_max_len_seq_inner.pyx ./signal/_peak_finding_utils.pyx ./signal/_upfirdn_apply.pyx ./signal/_spectral.pyx ./signal/_sosfilt.pyx ./stats/_boost/src/beta_ufunc.pyx ./stats/_boost/src/nbinom_ufunc.pyx ./stats/_boost/src/binom_ufunc.pyx ./stats/_stats.pyx ./stats/_qmc_cy.pyx ./stats/_sobol.pyx ./stats/biasedurn.pyx ```
scoder commented 3 years ago

files containing "distutils: language=c++". Note that most are in .pxd files

The distutils directive isn't looked at in .pxd files, simply because they are not handled by distutils.

scoder commented 3 years ago

how important is C++ support? I.e, can we land support for cython without it and get back to that later?

Is there a reason not to use the cythonize frontend? For the simple case of compiling one source file, it's just as good as the cython frontend, but it considers the #distutils directive. If you pass --force, it will also ignore the file timestamps and always overwrite what's there.

rgommers commented 3 years ago

Is there a reason not to use the cythonize frontend? For the simple case of compiling one source file, it's just as good as the cython frontend,

Not sure I agree with that. cythonize invokes setuptools under the hood right? Invoking a second build system from within another build system is not a good idea, you're going to get different compile flags, issues with caching, and all sorts of hard to debug issues down the line.

scoder commented 3 years ago

cythonize invokes setuptools under the hood right?

Only if you ask it to build a complete extension module for you. If you just invoke it like cythonize module.pyx, without -i or -b, then it will just generate the .c/.cpp file for it and be done.

rgommers commented 3 years ago

If you just invoke it like cythonize module.pyx, without -i or -b, then it will just generate the .c/.cpp file for it and be done.

Ah, didn't realize that. It seems to be missing flags though, like --output-file (this one is a hard necessity, because Meson doesn't allow in-tree builds) and --include-dir. It's not a superset of cython it seems, just a different thing altogether?

scoder commented 3 years ago

It seems to be missing flags though, like --output-file

Oh, yeah, right. That sort-of goes against the idea of a "here's a bunch of files, do the right thing" kind of tool. Single file compilation is the normal case for cython, but more of a special case for cythonize. All of these settings go into generic options.

For the include path, there's -s include_path=…, for the output directory -s output_dir=…, for a single output file -s output_file=….

dcbaker commented 3 years ago

Yes, there's a bunch of good reasons to not use cythonzie and invoke cython directly:

1) we already have the infastructure in place 2) cythonize goes and does stuff behind our backs, that's not nice 3) we want to apply the CFLAGS or CPPFLAGS to the C or C++ files generated 4) we want to ccache the generated files if possible 5) we want to control the C or C++ compiler that's being used, including arguments passed (this is critical for cross compiling)

dcbaker commented 3 years ago

It does appear that I haven't wired up include directories though... I need to fix that.

scoder commented 3 years ago

Sorry. All of this sounds like FUD to me.

Yes, there's a bunch of good reasons to not use cythonzie and invoke cython directly:

1. we already have the infastructure in place

It's just a different command that you call. I don't see any infrastructure involved here.

2. cythonize goes and does stuff behind our backs, that's not nice

I don't know what you are referring to here. Maybe that it won't regenerate a target file if it knows that it doesn't have to (unless you --force it to)? Sounds more like a feature than a drawback to me. Or the fact that it knows whether the user wants C or C++ mode? Isn't that what you're after?

3. we want to apply the CFLAGS or CPPFLAGS to the C or C++ files generated

Those are environment variables. They get passed through if you ask cythonize to do the build for you. They don't matter to cythonize if you don't request that.

4. we want to ccache the generated files if possible

That's ok, cythonize doesn't care. Only the invoked C compiler does, whether called by cythonize or not.

5. we want to control the C or C++ compiler that's being used, including arguments passed (this is critical for cross compiling)

From what I understand, you do not want cythonize to invoke the C compiler for you. Then just don't instruct it to.

eli-schwartz commented 3 years ago

meson wants to be the coordinator for all compile stages. meson wants to generate ninja rules, just like any Makefile generator would, which:

using meson's choice of $CC which is not actually available as an environment variable because Meson Does Not Do Environment Variables™ (at build time at least, and reluctantly at configure time). In part because, well, ninja doesn't really support them. And wrapping commands via /usr/bin/env VAR=VAL cmd and meson --internal basically-env-but-cross-platform.py VAR=VAL cmd is complicated and introduces overhead. And passing them as cythonize --cc="the-chosen-cc" --cflags="the-chosen-cflags" is not particularly greater and has already caused problems for e.g. gtkdoc and stuff which misparse command lines.

And none of this solves the problem where ninja should be able to report job progress and tell you when stuff is out of date without running the cythonize tool, which is important to meson because the point of meson using ninja is in part to guarantee no-op builds are instant, and running cythonize to check is not instant and anyways is recorded as a running build job, not "everything is up to date".

If cythonize can be told to not build, only generate the .c files, then I see no problem. But it's not clear to me what it adds over the cython tool. If this is about detecting whether to generate .c or .cpp output files, then meson still needs to somehow know which one to generate output rules for, which is configure-time data. So this would need to either be:

dcbaker commented 3 years ago

As I said, I do have a branch that makes the C vs C++ selection work, but there's some corner cases I still haven't worked out.

Another thing to consider here, is that meson considers the target to be the lowest level of granularity, not the file. Meson does not, by design, allow you to pass specific arguments only to one file in a target, all files in the target are built with the same arguments. That means that if you have a target with two pyx files, they most both be compiled as C or both as C++, or you need to use an intermediate target. So if cythonize reads # distutils: language=c++ out of one file in a target and creates a .cpp file then the compilation will fail, as Meson expects a C file. Changing that design would a monumental task, and would probably not be accepted anyway.

rgommers commented 3 years ago

Something I just ran into with this branch: when the cython invocation fails, the ninja build silently continues and no Cython errors are shown. This makes it quite hard to debug.

An easy way to reproduce is to add a line:

from . cimport doesnt_exist

in a generated .pyx file.

The actual code I was working on (from https://github.com/rgommers/scipy/blob/meson/scipy/linalg/meson.build):

# _decomp_update
_decomp_update_pyx = custom_target('_decomp_update',
  build_by_default : true,
  output : '_decomp_update.pyx',
  input : '_decomp_update.pyx.in',
  command : [py3, tempita, '@INPUT@', '-o', '@OUTDIR@'],
)

py3.extension_module('_decomp_update',
  _decomp_update_pyx,
  dependencies : py3_dep,
  include_directories : inc_np,
  install : true,
  subdir : 'scipy/linalg')
eli-schwartz commented 3 years ago

Your project doesn't reproduce this issue since it fails immediately with

ninja: error: '_decomp_update.pyx', needed by 'scipy/linalg/_decomp_update.cpython-39-x86_64-linux-gnu.so.p/_decomp_update.pyx.c', missing and no known rule to make it

This is being caused by the rules:

build scipy/linalg/_decomp_update.pyx: CUSTOM_COMMAND ../scipy/linalg/_decomp_update.pyx.in | ../scipy/_build_utils/tempita.py /usr/bin/python3
 COMMAND = /usr/bin/python3 ../scipy/_build_utils/tempita.py ../scipy/linalg/_decomp_update.pyx.in -o scipy/linalg
 description = Generating$ _decomp_update$ with$ a$ custom$ command

build scipy/linalg/_decomp_update.cpython-39-x86_64-linux-gnu.so.p/_decomp_update.pyx.c: cython_COMPILER _decomp_update.pyx
 ARGS = --fast-fail -3 -I/usr/lib/python3.9/site-packages/numpy/core/include -o scipy/linalg/_decomp_update.cpython-39-x86_64-linux-gnu.so.p/_decomp_update.pyx.c
rgommers commented 3 years ago

Your project doesn't reproduce this issue

Argh, I was picking up an older version of this branch, sorry for the noise.

This is being caused by the rules:

Thanks. Am I doing something wrong then? It looks to me like _decomp_update.pyx.c: cython_COMPILER _decomp_update.pyx in the rule you show above is incorrect. Since _decomp_update.pyx is a generated file, I believe the rule should be:

_decomp_update.pyx.c: cython_COMPILER scipy/linalg/_decomp_update.pyx

As far as I can tell, I'm doing the same thing as the ct_ext test case in cython/2 generated sources/meson.build here (unless using @OUTDIR@ instead of @OUTPUT@ matters, but it shouldn't).

If I manually edit ninja.build to update that rule, I do get the expected Cython failure:

[71/118] Compiling Cython source scipy/linalg/_decomp_update.pyx
FAILED: scipy/linalg/_decomp_update.cpython-39-x86_64-linux-gnu.so.p/_decomp_update.pyx.c 
cython --fast-fail -3 -I/home/rgommers/anaconda3/envs/scipy-meson/lib/python3.9/site-packages/numpy/core/include -o scipy/linalg/_decomp_update.cpython-39-x86_64-linux-gnu.so.p/_decomp_update.pyx.c scipy/linalg/_decomp_update.pyx

Error compiling Cython file:
------------------------------------------------------------
...
# and object. In this file, only NULL is passed to these parameters.
cdef extern from *:
    cnp.ndarray PyArray_CheckFromAny(object, void*, int, int, int, void*)
    cnp.ndarray PyArray_FromArray(cnp.ndarray, void*, int)

from . cimport cython_blas as blas_pointers
eli-schwartz commented 3 years ago

Meson doesn't try to parse your command and always assumes the output file is in the current_build_dir() and the filename. It's very clear that this is a bug in the PR as it probably doesn't correctly handle generated outputs and the test case accidentally works because it doesn't use subdir()

eli-schwartz commented 3 years ago

That being said it would be easier to evaluate the bug you wanted to report rather than the bug you found along the way, if you had a minimal reproducer meson.build that didn't require editing build.ninja to view its effects.

eli-schwartz commented 3 years ago

Something I just ran into with this branch: when the cython invocation fails, the ninja build silently continues and no Cython errors are shown. This makes it quite hard to debug.

I'm confused, your later comment shows a cython error occurring. Is this not what you meant?

rgommers commented 3 years ago

Thanks for the explanation @eli-schwartz.

I'm confused, your later comment shows a cython error occurring.

Yes, that's expected given the state of my branch - the code was using a cimport of another module that I had not yet added support for. There's no bug there.

It's very clear that this is a bug in the PR

That's what I suspected, but wasn't sure. Once that bug is fixed in this PR, I'm unblocked.

eli-schwartz commented 3 years ago

Okay, thanks for clarifying! Was just trying to figure out if I should keep investigating your original suspicion that there were cases of failure-and-silently-continuing.

scoder commented 3 years ago

whether to generate .c or .cpp output files, then meson still needs to somehow know

That's reasonable. Seems like it doesn't matter which you choose then, and cython wins by the simpler interface.

Regarding dependency collection, Cython does this in the Cython.Build.Dependencies module, in DependencyTree.all_dependencies(). Probably something that could be exposed in a simpler way.

dcbaker commented 3 years ago

I've updated the generated file support so it should handle generated targets correctly when they're in a subfolder. I get bit by that every time :(

dcbaker commented 3 years ago

@scoder one of the things that would make everything much better is to teach cython to output a make compatible dependency file. I looked at DependencyTree.all_dependencies() but I'm not sure how to translate the information in the DependencyTree into something ninja can consume.

rgommers commented 3 years ago

I've updated the generated file support so it should handle generated targets correctly when they're in a subfolder.

I gave this a quick try, this gives me:

File "[...]/site-packages/mesonbuild/backend/ninjabackend.py", line 1593, in generate_cython_transpile
    ssrc = os.path.join(gen.subdir, ssrc)
AttributeError: 'CustomTargetIndex' object has no attribute 'subdir'

when running meson setup build and using a custom_target with multiple outputs, one of which is a .pyx file, as documented at https://mesonbuild.com/Generating-sources.html#generating-multiple-files-at-a-time

If that doesn't ring any immediate bells, I can try to turn the offending code into a new test case tomorrow.

dcbaker commented 3 years ago

@rgommers fixed in the lastest push.

CustomTargetIndex doesn't have a subdir attribute, but it does have a get_subdir method.

rgommers commented 3 years ago

That worked, thanks!

scoder commented 3 years ago

cython to output a make compatible dependency file

Yeah, there's even one of those really old feature requests for this: https://github.com/cython/cython/issues/1214 Looks like we never implemented this. Definitely useful, and shouldn't be difficult from the result of all_dependencies().

jpakkane commented 3 years ago

What's the current plan on getting this merged?

dcbaker commented 3 years ago

I need to get the CI passing, then it's ready from my point of view

jpakkane commented 2 years ago

@rgommers is this looking good for you?

rgommers commented 2 years ago

@rgommers is this looking good for you?

Yes, I'm pretty happy with this. It works for all the simpler cases I'd expect it to work on. And the more complicated ones, like codegen of multiple files that depend on each other and other files in the tree, is the separate "structured inputs" PR gh-8775.

rgommers commented 2 years ago

Awesome to have this merged. Thanks @dcbaker, all!