pypa / setuptools

Official project repository for the Setuptools build system
https://pypi.org/project/setuptools/
MIT License
2.42k stars 1.17k forks source link

how to handle numpy.distutils and setuptools interaction #2372

Open rgommers opened 3 years ago

rgommers commented 3 years ago

Hi setuptools devs, I'd like to use this issue to summarize issues in the interaction between setuptools and numpy.distutils, and then see if we can find a way to resolve those structurally.

The setuptools 50.0 release gave NumPy and SciPy (a heavy numpy.distutils users) a number of concrete problems:

All of the above are resolvable. The bigger issue here is the interaction between distutils, numpy.distutils and setuptools. Due to setuptools release cadence, such breaks will keep on happening due to incompatibilities between numpy.distutils and setuptools. Let me summarize the issue, and then some options for dealing with it.

Old situation, for setuptools <50.0:

  1. numpy.distutils extends and monkeypatches distutils.
  2. the import order in NumPy and SciPy setup.py is:
    import setuptools  # unused, just to let setuptools do its monkeypatching dance first
    from numpy.distutils.core import setup

This situation worked reasonably well, because:

New situation, for setuptools 50.0:

(default situation, no env var set)

  1. setuptools replaces distutils with its vendored setuptools._distutils, even if plain distuils is imported first
  2. numpy.distutils is unchanged, so it still extends and monkeypatches distutils - which is now setuptools._distutils

So numpy.distutils finds itself monkeypatching setuptools code all of sudden, and that setuptools code includes patches from Python 3.9-dev that are therefore now released into the wild with new setuptools releases without any alpha/beta/QA trajectory like we had before.

The releasing new patches quickly without testing will still be an issue for numpy.distutils probably, even if the SETUPTOOLS_USE_DISTUTILS="local" behaviour gets reverted for the time being (which seems likely, since there's a ton of other issues right now, like the Debian breakage).

What now?

Longer-term I don't think it's feasible for numpy.distutils to work as it does today and extend setuptools; the release cycle mismatch will be too much of a problem. I'm not quite sure what the best solution for that is though. Options could include:

Let me emphasize that I do see the upsides in merging disutils into setuptools, both design and maintenance wise. And we (NumPy & scientific packages) have been burned in the past by distutils patches going unmerged for ages, so setuptools being better maintained is great.

Also, for context, it may be useful to list what numpy.distutils adds besides carrying around needed distutils patches:

Looking forward to hearing your thoughts on this topic.

bashtage commented 3 years ago

It also resulted in build Failure on Windows for statsmodels, see statsmodels/statsmodels#7016.

zooba commented 3 years ago

Just to add some extra info (which I already posted on one of the related threads, but it belongs here):

The distutils/setuptools merge was done with the full blessing of the core CPython team, and we plan to deprecate (in 3.10) and remove (in 3.12) distutils from the standard library completely. The specific versions may vary (I'm writing the PEP now), but the overall plan is uncontroversial.

I don't think we'd have any concerns if numpy.distutils also took a copy of the current distutils code, or one of the other options. Bear in mind that it should be nearly feasible to put up the build tool as its own package and use PEP 517 to bring it in, which could remove setuptools completely from your equation (though I am very much aware of the other issues that numpy et al. face with pip's implementation of PEP 518).

Best of luck sorting this out! Sorry that it showed up as failures like this.

jaraco commented 3 years ago

I have not looked into the details, but my instinct is that a combination of some options would be best:

To be sure, the plan is for setuptools not to expose 'distutils' long-term. Soon after it can safely own the code, it will deprecate imports of distutils and present its own imports of the needed interfaces (i.e. distutils.core.setup -> setuptools.setup, etc), so whatever we can do to support building numpy/scipy through long-term interfaces would be preferable.

* [numpy/numpy#17209](https://github.com/numpy/numpy/pull/17209), CI break in NumPy, `from distutils import sysconfig` broken on TravisCI

I don't understand the failure here. Would you consider filing a bug with this, either with setuptools or pypa/distutils, especially if you have a way to replicate the failure?

rgommers commented 3 years ago

I don't think we'd have any concerns if numpy.distutils also took a copy of the current distutils code, or one of the other options. Bear in mind that it should be nearly feasible to put up the build tool as its own package and use PEP 517 to bring it in, which could remove setuptools completely from your equation (though I am very much aware of the other issues that numpy et al. face with pip's implementation of PEP 518).

I'm not worrying too much about how to bring the build tool in. We will have to support pip install <package_name> and python setup.py develop at least (pip's editable installs don't quite cut it). And we have thousands of of downstream users of numpy.distutils, so we can't just switch to a completely different method like scikit-build or Bento.

Hence the choice is indeed to either vendor distutils, or have a dependency on setuptools. I'd prefer the latter, because the non-distutils part of setuptools does offer some functionality that people want and rely on, and syncing distutils patches between setuptools and our vendored copy would also be a pain.

I have not looked into the details, but my instinct is that a combination of some options would be best:

Makes sense.

* adapt distutils patches to be generally useful (or selectively enabled) and contribute those to pypa/distutils or setuptools.

I assume you're not interested in adopting any of the main features of numpy.distutils I listed, except for better MinGW support?

Maybe better CPU feature detection makes sense too?

* provide setuptools extensions to implement additional functionality where appropriate; setuptools could add extendable hooks if that helps.

* rely on PEP-517 to declare supported versions of Setuptools to govern the speed of adopting changes.

The one annoyance there is that, unless we split out numpy.distutils into its own package, we don't want to add a runtime dependency on setuptools, hence declaring those supported versions will be a matter of putting it in the NumPy release notes and manually adding it to pyproject.toml of every downstream user. Maybe that is a good reason to do that splitting off into a separate package.

To be sure, the plan is for setuptools not to expose 'distutils' long-term. Soon after it can safely own the code, it will deprecate imports of distutils and present its own imports of the needed interfaces (i.e. distutils.core.setup -> setuptools.setup, etc), so whatever we can do to support building numpy/scipy through long-term interfaces would be preferable.

I think the details of that plan will be very useful to figure out what to do here. For example:

I don't understand the failure here. Would you consider filing a bug with this, either with setuptools or pypa/distutils, especially if you have a way to replicate the failure?

Looks like the cause is distutils.sysconfig having moved to sysconfig in the stdlib, but not completely - so now we need pieces of both. Work ongoing in https://github.com/numpy/numpy/pull/17223 to sort it out on the NumPy end, we'll open an issue if there's a problem left after that's done.

jaraco commented 3 years ago

I assume you're not interested in adopting any of the main features of numpy.distutils I listed, except for better MinGW support?

Maybe better CPU feature detection makes sense too?

If the behaviors are generally valuable and can be implemented in a way that's not disruptive of supported use-cases, I've no objection to incorporating any number of features.

jaraco commented 3 years ago

I think the details of that plan will be very useful to figure out what to do here.

I agree these are good questions. My plan was to address issues like these incrementally, as needed. First step will be creating suitably-compatible versions of public interfaces entirely in the setuptools namespace and weaning users and packages off of import distutils*.

  • Will all of distutils.command be merged into setuptools.command mostly unchanged (e.g. command.config is missing in setuptools now)?

Almost certainly.

  • Will compiler support go into setuptools submodules and will you keep all of it with the distutils names (e.g. distutils.msvc9compiler -> setuptools.msvc9compiler and get rid of setuptools/msvc.py)?

Maybe. Here we'll need to explore what interfaces the users need for these modules. Ultimately, I'd like to consolidate a lot of these behaviors, but it may be necessary to maintain some legacy interfaces. More planning and design is needed here.

  • Do you have an estimate for timeline? If it's a few months we can simply wait till the dust settles, and then adjust based on the new shape things have taken; if it's >1 year we may be adjusting while you are migrating things, which could be more complicated.

I was hoping O(weeks) to have distutils adopted, but it's proven more difficult (mostly due to system package manager patches), and it's not obvious to me how fast that blocker can be cleared. After full adoption is the norm, I expect to perform a refactoring every few weeks. I think it's possible to take more than 1 year, but more likely 6-9 months would be my guess.

rgommers commented 3 years ago

If the behaviors are generally valuable and can be implemented in a way that's not disruptive of supported use-cases, I've no objection to incorporating any number of features.

Thanks @jaraco, that helps. For now I won't bother you to think about things like Fortran compiler support or linear algebra libraries, but it's good to know you're open to new features if they can be fit in in a clean, non-disruptive way.

I think it's possible to take more than 1 year, but more likely 6-9 months would be my guess.

Given our (low) bandwidth for working on numpy.distutils and difficulty in testing N1 platforms x N2 compilers x N3 linalg libraries, I'm inclined to wait those 6-9 months and just keep an eye on how things go.

More planning and design is needed here.

If you need input from the NumPy side on particular design decisions or on a design document, please feel free to ping me any time.

mattip commented 3 years ago

xref python/cpython#22088 to solve https://bugs.python.org/issue39825.

pv commented 3 years ago

One comment about vendoring: it probably would not be sufficient for numpy.distutils to vendor only distutils, as IIRC things such as proper pip/wheel/MSVC support comes from setuptools. This relies on setuptools monkeypatching distutils command/compiler framework which presumably now is on the table for refactoring and probably contains brittle things that refactoring can break, so mixing "frozen" distutils and new setuptools eventually stops working? If so, it seems vendoring would imply forking distutils+setuptools and keeping the forks on zombie life support. (I'm not sure how this plays together with pip import setuptools.)

Maybe such forks can be kept frozen for a long time? I'm not sure how many distutils/setuptools fixes are essential to keep things building on new Python releases. This also reflects in how long Numpy and other packages depending on numpy.distutils can continue pinning to the pre-50 setuptools version.

(The above point may have some relevance also for the discussion about removal of distutils from stdlib, as existing setup.py may rely both on setuptools and "old" distutils features to work together properly. But this discussion probably should be continued elsewhere.)

Adapting numpy.distutils to a public API that a refactored setuptools provides would be simpler in the long run once we get there. However, in this case keeping close to 100% backward compatibility for existing setup.py files sounds challenging, especially if the distutils refactoring is significant.

For integrating numpy.distutils features to setuptools: most projects using numpy.distutils probably mainly need the Fortran compiler support and features associated with that, and not much else. However, as with distutils, if backward compatibility is going to be broken, there are quirks that should be be ironed out in the functionality and sorting that out takes time.

jaraco commented 3 years ago

Given our (low) bandwidth for working on numpy.distutils and difficulty in testing N1 platforms x N2 compilers x N3 linalg libraries, I'm inclined to wait those 6-9 months and just keep an eye on how things go.

In that case, should Setuptools consider NumPy a non-blocker for making SETUPTOOLS_USE_DISTUTILS=local the default (requiring numpy builds to either override the value to stdlib or otherwise avoid those releases)? I'm okay with that, and allows Setuptools to focus on the Debian/Fedora patches to arrive at a solution exclusive of NumPy and proceed with adoption.

rgommers commented 3 years ago

Thanks for asking @jaraco. NumPy itself is already pinning to <49.2.0, and the latest SciPy release pins to <= 51.0.0. I think all projects should start doing this - keeping latest setuptools in CI for as long as possible, while pinning setuptools in their releases. I suspect most scientific libraries don't do that right now in their pyproject.toml, so a pip install pkgname --no-binary may break. But long-term that seems inevitable anyway, and it doesn't affect many end users given that there are wheels for all common platforms. So I'd say just go ahead.

isuruf commented 2 years ago

@jaraco, what's your feeling on moving the Fortran compiler support from numpy.distutils to distutils?

Also, can you move this issue to pypa/distutils repo?

rgommers commented 2 years ago

I'd prefer not to move this issue - this is between the Setuptools and NumPy projects, so having cross-linked issues between those two projects seems right to me. For plain distutils this is kind of out of scope.

@jaraco, what's your feeling on moving the Fortran compiler support from numpy.distutils to distutils?

xref @jaraco's earlier answer: https://github.com/pypa/setuptools/issues/2372#issuecomment-687575343. Would be good to know if that changed in the meantime, but I'd expect that it didn't.

pradyunsg commented 1 year ago

One thing I will mention (since confusion around this has been stated): pip will always import setuptools before running setup.py. Thus, the positioning/ordering of import setuptools in a setup.py doesn't matter.

https://github.com/pypa/pip/blob/0a21080411c25acfb87fbc380631806e0477d7d3/src/pip/_internal/utils/setuptools_build.py#L5-L46

rgommers commented 1 year ago

I see that the change in plans for NumPy hasn't yet been posted here, so let me do so now. numpy.distutils is deprecated, and will go away for Python releases where plain distutils goes away. Users can migrate to another build system, or help add the feature(s) they need to setuptools. See https://numpy.org/devdocs/reference/distutils_status_migration.html

pradyunsg commented 1 year ago

Is there any timeline for the plans to move numpy itself away from trying to use setuptools < 60 as its build system?

rgommers commented 1 year ago

The planned timeline is "by the time we need it for Python 3.12", because we kinda have to. It's still a big job though, so it depends on when we can make some dedicated time for the right person(s).

I made a start in https://github.com/rgommers/numpy/tree/meson, and the configure checks turned out to be a lot easier than with disutils. Compiler support is mostly figured out too, because that's common with SciPy. The main sticking point will be SIMD support, see numpy.distutils.ccompiler_opt and this diagram in the docs