pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.53k stars 3.03k forks source link

PEP 518 build requirements cannot be overriden by user #4582

Open ghost opened 7 years ago

ghost commented 7 years ago

Apparently not, it seems to call pip install --ignore-installed .... Because the build itself is not be isolated from the environment in other respects, I'm not sure if this is actually sensible behavior by pip...

If the target computer already has a satisfactory version of numpy, then the build system should use that version. Only if the version is not already installed should pip use an isolated environment.

Related: scipy/scipy#7309

pv commented 7 years ago

Some reasons why the current behavior is bad for Numpy especially:

njsmith commented 7 years ago

I responded on the scipy issue before seeing this one: https://github.com/scipy/scipy/pull/7309#issuecomment-312106093

The most correct solution to the abi issue is to build against the lowest supported numpy version. Building against the currently installed version is a hack that will fail in a number of cases; I mentioned one of them there, but another is that due to pip's wheel caching feature, if you first install scipy into an environment that has the latest numpy, and then later install scipy into an environment that has an older numpy, pip won't even invoke scipy's build system the second time, it'll just install the cached build from last time.

pv commented 7 years ago

Yes, the ABI issue indeed can be handled with specifying the earliest numpy version.

rgommers commented 7 years ago

The lowest supported version is normally Python version dependent (now numpy 1.8.2 is lowest supported, but clearly not for Python 3.6 because 1.8.2 predated Python 3.6 by a long time).

So the specification will then have to be:

numpy=='1.8.2';python_version<='3.4'
numpy=='1.9.3';python_version=='3.5'
numpy=='1.12.1';python_version=='3.6'

I have the feeling not many projects are going to get this right ....

rgommers commented 7 years ago

That still leaves a question of what to do for a not-yet-released Python version. Would you do:

numpy=='1.12.1';python_version>='3.6'

or

numpy;python_version>='3.7'

I'd suspect the first one, but either way you have to guess whether or not an existing version of numpy is going to work with a future Python version. You have to think about it though, if you don't specify anything for Python 3.7 then a build in an isolated venv will break (right?). So then you'd have to cut a new release for a new Python version.

njsmith commented 7 years ago

I guess the not-yet-released-Python issue is sort of the same as anything else about supporting not-yet-released-Python. When developing a library like (say) scipy, you have to make a guess about how future-Python will work, in terms of language semantics, C API changes, ... and if it turns out you guess wrong then you have to cut a new release? I'm not sure there is a really great solution beyond that.

Something that came up during the PEP 518 discussions, and that would be very reasonable, was the idea of having a way for users to manually override some build-requires when building. This is one situation where that might be useful.

rgommers commented 7 years ago

It's a little different in this case - here we use == rather than >= (as typically done in version specifiers in setup.py), which makes it much more critical to guess right.

E.g. if Python 3.7 breaks numpy, then I now need a new numpy release and new releases of every single package that depends on numpy and went with numpy=='1.12.1'. Normally in version specifiers, you say something like numpy>=x.y.z. Then if the same happens, you need a new numpy release but nothing else.

pradyunsg commented 7 years ago

Yes, the ABI issue indeed can be handled with specifying the earliest numpy version. \<and> I have the feeling not many projects are going to get this right ....

I don't think there's any way to do "use earliest compatible version" with pip; would it be something useful in this situation?

rgommers commented 7 years ago

@pradyunsg I think in principle yes. Are you thinking about looking at the PyPI classifiers to determine what "earliest compatible" is?

pradyunsg commented 7 years ago

Are you thinking about looking at the PyPI classifiers to determine what "earliest compatible" is?

TBH, I'm not really sure how this would be done. For one, I don't think we have anything other than the PyPI Classifiers for doing something this and I'm skeptical of using those for determining if pip can install a package...

rgommers commented 7 years ago

Yeah that's probably not the most robust mechanism.

njsmith commented 7 years ago

There is a way to specify earliest compatible python version in package metadata. Not the trove classifiers – those are just informational. The IPython folks worked this out because they needed to be able to tell pip not to try to install new IPython on py2.

The problem with this though is that old numpy packages can't contain metadata saying that they don't work with newer python, because by definition we don't know that until after the new python is released. (Also I think the current metadata might just be "minimum python version", not "maximum python version".)

dstufft commented 7 years ago

The current metadata is not minimum or maximum, but a full version specifier which supports >=, >,==,<``, etc. I suspect the biggest blockers here are:

1) That metadata is relatively new, so hardly anything is using it currently. What do we do if nothing has it, do we just assume everything is compatible and install the oldest version available? That seems unlikely to work but it's also confusing if we suddenly switch from installing the latest to the newest once they upload a version that has that metadata. 2) A project can't know in the future what version of Python it's going to stop working on, pip 9 currently works on Python 3.6, will it work on Python 3.12? I have no idea!

Maximum versions that don't represent a version that already exist are basically impossible to get right except by pure chance. You pretty much always end up either under or over specifying things.

I had always presumed that Numpy would change the wheel requiremnts to reflect the version that it was built against, so that the dependency solver then (theortically until #988 is solved) handles things to ensure there is no version incompatibility related segfaults.

I think the worse case here is you end up installing something new that depends on Numpy and end up having to also install a new Numpy because now you have something that has a numpy>=$LATEST requirement, but since all the old things have a numpy>=$OLDERVERSION requirement, they won't need to be reinstalled, just numpy and the new thing. Combine this with the wheel cache and the fact that Numpy is pretty good about providing wheels for the big 3 platforms, it feels like this isn't going to be a big deal in practice?

Am I missing something?

njsmith commented 7 years ago

@dstufft: the specific concern here is how to handle build requires (not install requires) for downstream packages like scipy that use the numpy C API.

The basic compatibility fact that needs to be dealt with is: if you build scipy against numpy 1.x.y, then the resulting binary has a requirement for numpy >= 1.x.0 (though traditionally this has been left implicit)

In the past, this has been managed in one of two ways:

  1. If you were downloading a scipy wheel, then that was built by some expert who had manually set up their build environment in an optimal way so the wheel would work everywhere you expect.

  2. If you were building scipy yourself, you were using setup.py install, so the build environment would always be the same as the install environment. Each environment gets its own bespoke build of scipy that's adapted to the numpy in that environment. (And mostly people only upgrade numpy inside an existing environment, never downgrade, so this mostly works out.)

But with pyproject.toml and in general, we're increasingly moving towards a third, hybrid scenario, where pip is automatically building scipy wheels on end user machines for installation into multiple environments. So it's the second use case above, but technically it's implemented acts more like the first, except now the expert's manual judgement has been replaced by an algorithm.

The policy that experts would traditionally use for building a scipy wheel was: install the latest point release of the oldest numpy that meets the criteria (a) scipy still supports it, and (b) it works on the python version that this wheel is being built for.

This works great when implemented as a manual policy by an expert, but it's rather beyond pip currently, and possibly forever... And @rgommers is pointing out that if we encode it manually as a set of per-python-version pins, and then bake those pins into the scipy sdist, the resulting sdists will only support being built into wheels on python versions that they have correct pins for. Whereas in the past, when a new python version came out, if you were doing route (1) then the expert would pick an appropriate numpy version at the time of build, and if you were doing route (2) then you'd implicitly only ever build against numpy versions that work on the python you're installing against.

That's why having at least an option for end users to override the pyproject.toml requirements would be useful: if you have a scipy sdist that says it wants numpy == 1.12.1; python >= "3.7", but in fact it turns out that on 3.7 you need numpy 1.13.2, you could do pip install --override="numpy == 1.13.2" scipy.tar.gz. That solves the wheels for redistribution case, and provides at least some option for end users building from sdist. The case it doesn't handle is when plain pip install someproject ends up needing to install from an sdist and in the past this kinda worked seamlessly via the setup.py install route, but now would require end users to occasionally do this manual override thing.

dstufft commented 7 years ago

@njsmith I don't understand why it's bad for SciPy to implicitly get built against a newer NumPy though. When we install that build SciPy anything already installed will still work fine, becuase NumPy is >= dependency and a newer one is >= an older one, and we'll just install a newer NumPy when we install that freshly built SciPy to satisify the constraint that SciPy's wheel will have for a newer NumPy.

pfmoore commented 7 years ago

But with pyproject.toml and in general, we're increasingly moving towards a third, hybrid scenario, where pip is automatically building scipy wheels on end user machines for installation into multiple environments.

Sorry to butt in here, but are we? I don't see that at all as what's happening. I would still expect the vast majority of installs to be from published wheels, built by the project team by their experts (your item 1).

The move to pyproject.toml and PEP 517 allows projects to use alternative tools for those builds, which hopefully will make those experts' jobs easier as they don't have to force their build processes into the setuptools mould if there's a more appropriate backend, but that's all.

It's possible that the changes we're making will also open up the possibility of building their own installation to people who previously couldn't because the setuptools approach was too fragile for general use. But those are people who currently have no access to projects like scipy at all. And it's possible that people like that might share their built wheels (either deliberately, or via the wheel cache). At that point, maybe we have an issue because the wheel metadata can't encode enough of the build environment to distinguish such builds from the more carefully constructed "official" builds. But surely the resolution for that is simply to declare such situations as unsupported ("don't share home-built wheels of scipy with other environments unless you understand the binary compatibility restrictions of scipy").

You seem to be saying that pip install <some sdist that depends on numpy> might fail - but I don't see how. The intermediate wheel that pip builds might only be suitable for the user's machine, and the compatibility tags might not say that, but how could it not install on the machine it was built on?

dstufft commented 7 years ago

To be clear, I understand why it's bad for that to happen for a wheel you're going to publish to PyPI, because you want those wheels to maintain as broad of compatibility as possible. But the wheels that pip is producing implicitly is generally just going to get cached in the wheel cache for this specific machine.

rgommers commented 7 years ago

To be clear, I understand why it's bad for that to happen for a wheel you're going to publish to PyPI, because you want those wheels to maintain as broad of compatibility as possible. But the wheels that pip is producing implicitly is generally just going to get cached in the wheel cache for this specific machine.

That's the whole point of this issue, that wheel built on a user system can now easily be incompatible with the numpy already installed on the same system. This is because of build isolation - pip will completely ignore the one already installed, and build a scipy wheel against a new numpy that it grabs from pypi in its isolated build env. So if installed_numpy < built_against_numpy, won't work.

Hence @njsmith points out that an override to say something like

pip install scipy --override-flag numpy==x.y.z

would be needed.

dstufft commented 7 years ago

@rgommers But why can't pip just upgrade the NumPy that was installed to match the newer version that the SciPy wheel was just built against? I'm trying to understand the constraints where you're able to install a new version of SciPy but not a new version of NumPy.

rgommers commented 7 years ago

@rgommers But why can't pip just upgrade the NumPy that was installed to match the newer version that the SciPy wheel was just built against?

It can, but currently it won't. The build-requires is not coupled to install-requires.

I'm trying to understand the constraints where you're able to install a new version of SciPy but not a new version of NumPy.

For the majority of users this will be fine. Exceptions are regressions in numpy, or (more likely) not wanting to upgrade at that point in time due to the extra regression testing required.

rgommers commented 7 years ago

But with pyproject.toml and in general, we're increasingly moving towards a third, hybrid scenario, where pip is automatically building scipy wheels on end user machines for installation into multiple environments.

Sorry to butt in here, but are we? I don't see that at all as what's happening. I would still expect the vast majority of installs to be from published wheels, built by the project team by their experts (your item 1).

Agreed that in general we are not moving in that direction. That third scenario is becoming more prominent though when we're moving people away from setup.py install to pip install, and the build isolation in PEP 518 currently is a regression for some use cases.

The move to pyproject.toml and PEP 517 allows projects to use alternative tools for those builds, which hopefully will make those experts' jobs easier as they don't have to force their build processes into the setuptools mould if there's a more appropriate backend, but that's all.

Totally agreed that PEP 517 and the direction things are moving in is a good one.

The only thing we’re worried about here is that regression for build isolation - it’s not a showstopper, but at least needs an override switch for things in the pyproject.toml build-requires so pip install project-depending-on-numpy can still be installed without being forced to upgrade numpy.

dstufft commented 7 years ago

It can, but currently it won't. The build-requires is not coupled to install-requires.

For SciPy and other things that link against NumPy it probably should be right? I understand that in the past it was probably painful to do this, but as we move forward it seems like that is the correct thing to happen here (independent of is decided in pip) since a SciPy that links against NumPy X.Y needs NumPy>=X.Y and X.Y-1 is not acceptable.

For the majority of users this will be fine. Exceptions are regressions in numpy, or (more likely) not wanting to upgrade at that point in time due to the extra regression testing required.

To be clear, I'm not explicitly against some sort of override flag. Mostly just trying to explore why we want it to see if there's a better solution (because in general more options adds conceptual overhead so the fewer we have the better, but obviously not to the extreme where we have no options).

One other option is for people who can't/won't upgrade their NumPy to switch to building using the build tool directly and then provide that wheel using find-links or similar.

I'm not sure which way I think is better, but I suspect that maybe this might be something we would hold off on and wait and see how common of a request it ends up being to solve this directly in pip. If only a handful of users ever need it, then maybe the less user friendly but more powerful/generic mechanism of "directly take control of the build process and provide your own wheels" ends up winning. If it ends up being a regular thing that is fairly common, then we figure out what sort of option we should add.

njsmith commented 7 years ago

Yeah, scipy and other packages using the numpy C API ought to couple their numpy install-requires to whichever version of numpy they're built against. (In fact numpy should probably export some API saying "if you build against me, then here's what you should put in your install-requires".) But that's a separate issue.

The pyproject.toml thing is probably clearer with some examples though. Let's assume we're on a platform where no scipy wheel is available (e.g. a raspberry pi).

Scenario 1

pip install scipy into a fresh empty virtualenv

Before pyproject.toml: this fails with an error, "You need to install numpy first". User has to manually install numpy, and then scipy. Not so great.

After pyproject.toml: scipy has a build-requires on numpy, so this automatically works, hooray

Scenario 2

pip install scipy into a virtualenv that has an old version of numpy installed

Before pyproject.toml: scipy is automatically built against the installed version of numpy, all is good

After pyproject.toml: scipy is automatically built against whatever version of numpy is declared in pyproject.toml. If this is just requires = ["numpy"] with no version constraint, then it's automatically built against the newest version of numpy. This gives a version of scipy that requires the latest numpy. We can/should fix scipy's build system so that at least it knows that it , but doing this for all projects downstream of numpy will take a little while. And even after that fix, this is still problematic if you don't want to upgrade numpy in this venv; and if the wheel goes into the wheel cache, it's problematic if you ever want to create a venv on this machine that uses an older version of numpy + this version of scipy. For example, you might want to test that the library you're writing works on an old version of numpy, or switch to an old version of numpy to reproduce some old results. (Like, imagine a tox configuration that tries to test against old-numpy + old-scipy, numpy == 1.10.1, scipy == 0.17.1, but silently ends up actually testing against numpy-latest + scipy == 0.17.1 instead.) Not so great

OTOH, you can configure pyproject.toml like requires = ["numpy == $SPECIFICOLDVERSION"]. Then scipy is automatically built against an old version of numpy, the wheel in the cache works with any supported version of numpy, all is good

Scenario 3

pip install scipy into a python 3.7 virtualenv that has numpy 1.13 installed

Before pyproject.toml: You have to manually install numpy, and you might have problems if you ever try to downgrade numpy, but at least in this simple case all is good

After pyproject.toml: If scipy uses requires = ["numpy"], then you get a forced upgrade of numpy and all the other issues described above, but it does work. Not so great

OTOH, if scipy uses requires = ["numpy == $SPECIFICVERSION"], and it turns out that they guessed wrong about whether $SPECIFICVERSION works on python 3.7, then this is totally broken and they have to roll a new release to support 3.7.

Summary

Scipy and similar projects have to pick how to do version pinning in their pyproject.toml, and all of the options cause some regression in some edge cases. My current feeling is that the numpy == $SPECIFICVERSION approach is probably the best option, and overall it's great that we're moving to a more structured/reliable/predictable way of handling all this stuff, but it does still have some downsides. And unfortunately it's a bit difficult to tell end-users "oh, right, you're using a new version of python, so what you need to do first of all is make a list of all the packages you use that link against numpy, and then write a custom build frontend..."

njsmith commented 7 years ago

Maybe we should open a separate issue specifically for the idea of a --build-requires-override flag to mitigate these problems. But some other use cases:

dstufft commented 7 years ago

I don't think we need a new issue, I think this issue is fine I'll just update the title because the current title isn't really meaningful I think.

njsmith commented 7 years ago

Agreed that the original title was not meaningful, but there are two conceptually distinct issues here. The first is that pyproject.toml currently causes some regressions for projects like scipy – is there anything we can/should do about that? The second is that hey, user overrides might be a good idea for a few reasons; one of those reasons is that they could mitigate (but not fully fix) the first problem.

Maybe the solution to the first problem is just that we implement user overrides and otherwise live with it, in which case the two discussions collapse into one. But it's not like we've done an exhaustive analysis of the scipy situation and figured out that definitely user overrides are The Solution, so if someone has a better idea then I hope they'll bring it up, instead of thinking that we've already solved the problem :-)

dstufft commented 7 years ago

@njsmith It's interesting to me that you think that numpy == $SPECIFICVERSION is the best option, because from my POV just letting pip upgrade to the latest version of NumPy seems like the best option, but that's not really important here since each project gets to pick what version of their build dependencies makes sense for them.

I suspect that for a hypothetical --build-requires-override we would prevent caching any wheels generated with an overriden build requirement. Otherwise you get into what I think is a bad situation where you get a cached wheel generated from essentially a different source that you just have to kind of remember that you used an override with to know the state of it (we don't cache wheels when you're using --build-option for similar reasons).

It also suffers from the same problem that a lot of our CLI options like this tend to hit, which is there isn't really a user friendly way to specify it. If you have --override-flag=numpy==1.0 effect everything we're installing that is typically not what you want (for instance, not everything might depend on numpy at all, or things might require different versions of the same build tool to build their wheels). However trying to specify things on a per project basis quickly ends up really gross, you start having to do things like --override-flag=scipy:numpy==1.0 (and what happens if something build requires on scipy, but a version of scipy that is incompatible with that version of numpy?).

At some point the answer becomes "sorry your situation is too complex, you're going to have to start building your own wheels and passing them into --find-links" but at a basic level parameterizing options by an individual package inside the entire set of packages is still somewhat of an unsolved problem in pip (and so far each attempt to solve it has been met with user pain).

So part of my... hesitation, is that properly figuring out the right UX of such a flag is non trivial and if we don't get the UX to be better than the base line of building a wheel and chucking it into a wheelhouse then it's a net negative.

njsmith commented 7 years ago

It's interesting to me that you think that numpy == $SPECIFICVERSION is the best option, because from my POV just letting pip upgrade to the latest version of NumPy seems like the best option

Well, here's an even more concrete example... Suppose someone writes pip install numpy==$V1 scipy==$V2, because they're pinning their versions like everyone says to. If scipy has to be built from source, and scipy uses requires = ["numpy"], then you end up with a scipy that install-requires the latest numpy. Let's assume that numpy just released a new version, so the pin is out of date. There are two possibilities: I think with the current pip resolver, the numpy==$V1 wins, so pip will generate a broken venv (import scipy will fail). Alternatively, once pip has a proper backtracking resolver, it'll just error out, because it can't simultaneously install numpy $V1 and numpy $LATEST. Neither option gives a working venv.

tl;dr: if any package uses the numpy C API and declares requires = ["numpy"] they'll break version pinning for everyone. So I think anyone who tries will find lots of people yelling at them. Certainly scipy can't possibly do this.

dstufft commented 7 years ago

@njsmith That's interesting, and it almost makes me wonder if our build requirements logic should be a bit more... complex? Although this gets complicated fast so I'm not sure it's even possible. My immediate thought is:

If we need to install X to build Y:

I'm REALLY not sure how I feel about that, it feels super magical and I feel like the edge cases are going to be gnarly but in a quick 5 minute thought, it feels like it might also do the right thing more often and require some sort of override less often... but I dunno it feels kinda icky.

rgommers commented 7 years ago

tl;dr: if any package uses the numpy C API and declares requires = ["numpy"] they'll break version pinning for everyone. So I think anyone who tries will find lots of people yelling at them. Certainly scipy can't possibly do this.

Only if you specify two packages to install at once. Just don't do that I'd say - I'd be perfectly have for pip install numpy==$V1 scipy==$V2 to give an error right now. It's not even clear in what order things would be installed, so the whole command is ambiguous anyway. If it's equivalent to

pip install numpy==$V1
pip install scipy==$V2

then that would work fine. The other way around would give a downgrade from latest to V1, and the pip UI will tell you that.

njsmith commented 7 years ago

@rgommers: A popular best-practice for building Python software is to maintain a requirements.txt with exact version pins for all of your dependencies, so that you consistently test the same versions and then deploy the versions you tested. And then to keep up to date, there are services like pyup.io or requires.io that will notice whenever one of your dependencies makes a new release, and automatically submit a PR to your project updating your pins. This way when a new version breaks something, you find out because the CI fails on that PR, rather than it causing random unrelated CI builds to start breaking.

This kind of automation is pretty sweet, but it does mean that suddenly all the little manual tricks we used to use (like splitting one install line into two) have to become visible and automatable. Which is probably a good thing in the long run, because these kinds of tricks are fine for you and me but create roadblocks for people who aren't immersed in this stuff. (And this is also why I'm dubious about adding a bunch of heuristics... we do want to end up with something predictable and automatable.) But it's kinda painful in the mean time...

I guess one general observation is that we don't really have a solution for pinning now that works for setup-requires. A requirements.txt can list the "top-level" versions of everything, but if some of those versions will be installed from sdists, there's currently no way to pin those sdists' build-requires.

...Honestly on further thought I think the right solution is for numpy is: numpy should force C API users to explicitly state which version of the numpy API they want, and then give them that. So then if scipy says it's using the numpy 1.8 API, that's what it gets, even if you build using numpy 1.13. (Numpy already has most of the machinery it would need to do this, because it can already handle a package expecting numpy 1.8 at import time – this would be extending that mechanism to build time as well.) Then scipy can build-requires numpy >= 1.8 or whatever, get the latest version of numpy, and produce a wheel that works with older numpy versions too.

njsmith commented 7 years ago

See https://github.com/numpy/numpy/issues/5888 for the numpy idea.

rgommers commented 7 years ago

A popular best-practice for building Python software

Pure Python software I'd say. It's never been right for compiled code, hence the " manual tricks" you refer to.

Honestly on further thought I think the right solution is for numpy is ...

That's a great idea and we should probably implement that soon, but not sure it's relevant for coming up with a good general design. NumPy is not the only package that people build-require nor the only one with a C API (SciPy has one too, for starters - not that it evolves much, but still).

I'm REALLY not sure how I feel about that, it feels super magical and I feel like the edge cases are going to be gnarly but in a quick 5 minute thought, it feels like it might also do the right thing more often and require some sort of override less often... but I dunno it feels kinda icky.

Too complex to work out in my head ....

rgommers commented 7 years ago

The scenario 3 from @njsmith's post above clarified for me that for the choice I posted in https://github.com/pypa/pip/issues/4582#issuecomment-312400907, it should be

numpy=='1.13.1';python_version>='3.7'

Having to make new releases is annoying, but not as bad as possible silent breakage. Going to implement that in a few projects now.

pradyunsg commented 7 years ago

@rgommers Just curious, you mean the following should be used; right?

numpy=='1.8.2';python_version<='3.4'
numpy=='1.9.3';python_version=='3.5'
numpy=='1.12.1';python_version=='3.6'
numpy=='1.13.1';python_version>='3.7'
rgommers commented 7 years ago

@pradyunsg indeed.

ghost commented 6 years ago

After some experience using pip 10.0, I do not think that the solution proposed by @rgommers is acceptable. What has essentially happened is that I've made changes to numpy.distutils to allow building dependent projects on windows; these changes are included in the latest release. However, pip 10.0 downloads the pinned version of NumPy that @rgommers has chosen for me which does not have these improvements, leading to a build failure.

What this effectively means is that I will not be able to use pip 10.0 unless I manually edit pyproject.toml before installing the project. From this perspective, the logic proposed by @dstufft seems much more appealing.

rgommers commented 6 years ago

After some experience using pip 10.0, I do not think that the solution proposed by @rgommers is acceptable.

I don't think I proposed any new solution, just trying to pick the right version specifiers for projects depending on numpy given the current PEP 518 spec.

From this perspective, the logic proposed by @dstufft seems much more appealing.

This thread is large and confusing, it would be useful to be more explicit. I think this is what you'd prefer: https://github.com/pypa/pip/issues/4582#issuecomment-313261473? It looks like you need either that or an override flag. I don't have a preference between those.

ghost commented 6 years ago

I don't think I proposed any new solution, just trying to pick the right version specifiers for projects depending on numpy given the current PEP 518 spec.

The specification requires the project to specify build requirements. In other words, the project should specify the range of versions that the project could build against, rather than just a particular version. I understand pyproject.toml is the way that it is in SciPy, but the correct fix is to modify the behavior of pip.

This thread is large and confusing, it would be useful to be more explicit.

In layman's terms, pip should build against the version already (or going to be) installed on the user's computer, to mirror behavior without PEP 518.

rbtcollins commented 6 years ago

In layman's terms, pip should build against the version already (or going to be) installed on the user's computer, to mirror behavior without PEP 518.

pip already does that: it calculates the set of versions it wants installed, and in depended-on-first order installs them, building just-in-time as required. Or at least it did, I haven't tracked the detailed changes in the last year. The only caveats here are:

-Rob

ghost commented 6 years ago

@rbtcollins Don't take offense, but I suspect you are not familiar with the subject matter. There is no consideration of installed requirements when resolving build dependencies. In addition, resolution of build dependencies is currently a complete hack.

ghost commented 6 years ago

FTR, here's the entire section of code that deals with resolution of build dependencies:

https://github.com/pypa/pip/blob/e81b602f902ee15aa88d4806434f9ea0c0ccaa56/src/pip/_internal/wheel.py#L627-L630

rbtcollins commented 6 years ago

@xoviat ah yes, @njsmith tagged me in here in https://github.com/pypa/pip/issues/4582#issuecomment-313234391 - I hadn't looked at the code added to support PEP-518, and yes, that appears entirely broken to me in the case of anything that compiles non universal wheels (e.g. ABI dependent things such as numpy/scipy). Thats related to but not identical to my reasoning that we should still be able to use constraints to influence build dependencies of things deep in the stack in the same way we use constraints to influence version selection deep in the 'resolver' (such as it is today).

njsmith commented 6 years ago

@xoviat Can you explain the problem you're running into in more detail? It's hard to evaluate proposals without these details. It sounds like you're trying to build a package that has an incorrect build dependency specification?

ghost commented 6 years ago

@njsmith Yes, that package is SciPy. The build dependency specification requires that SciPy is built against the oldest supported NumPy version. However, because numpy.distutils is coupled to NumPy, any changes (for example, adding new compilers) will not take effect when using pip to build SciPy from source.

njsmith commented 6 years ago

In this particular case, it sounds like scipy may need to bump their minimum required numpy version, perhaps only on windows. That should be fairly straightforward. Note that in this case you might also need to change the metadata in the resulting wheel, because for numpy the version used in the build becomes the install_requires minimum version.

For the general case, I agree it could be useful to have some way to override the build requirements. I just don't think it should be some sort of heuristic based on what other packages are being installed. It should be something with the explicit semantics 'when building scipy, use these requirements', 'when building scikit-learn, use these requirements', etc.

jdemeyer commented 5 years ago

Related: https://discuss.python.org/t/support-for-build-and-run-time-dependencies/1513

brainwane commented 4 years ago

Regarding the part of this problem that is blocked by the lack of a proper dependency resolver for pip: the beta of the new resolver is in pip 20.2 and we aim to roll it out in pip 20.3 (October) as the default. So if the new resolver behavior helps this problem (or makes it worse) now would be a good time to know.

akaihola commented 4 years ago

I think we're hitting this issue as well. We have an in-house package whose code is compatible with NumPy version 1.11.2 and up. We need to maintain some legacy remote production environments where we can't upgrade NumPy up from 1.11.2, but in other environments we want to stay up-to-date with newest NumPy.

In our package, we migrated to using pyproject.toml:

[build-system]
requires = ["Cython", "numpy", "setuptools>=40.8.0", "wheel>=0.33.6"]

When building the package for the legacy environment, we use one this constraints file:

# constraints.legacy.txt
numpy==1.11.2
scipy==0.18.1
# etc.

For modern environments we have e.g.

# constraints.new.txt
numpy==1.19.2
scipy==1.5.2
# etc.

When running tests in CI for our package, we do the equivalent of either

pip install --constraint constraints.legagy.txt  --editable .
pytest

or

pip install --constraint constraints.new.txt --editable .
pytest

However, in both cases the newest NumPy available is installed and compiled against, and running our package in the old environment miserably fails:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "ourpackage/ourmodule.pyx", line 1, in init ourpackage.ourmodule
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

What we would like pip to do is respect the pinned versions from --constraint also for build dependencies.

uranusjr commented 4 years ago

To be clear, pip never supported overriding dependencies anywhere, either build or run-time. The “trick” people used to use depends on a quirky behaviour of pip’s current (soon legacy) dependency resolver that should (eventually) go away. In that sense, it makes perfect sense that requirements specified from the command line does not override build dependencies in pyproject.toml, since that means that the PEP 517 successfully avoids a bug.

Stepping back from the specific request of overriding build dependencies, the problem presented in the top post can be avoided by adding additional logic to how build dependencies are chosen. When a package specifies numpy (for example) as a build dependency, pip can choose freely any version of numpy. Right now it chooses the latest simply because it’s the default logic. But we can instead condition the logic to prefer matching the run-time environment if possible instead, which would keep the spirit of build isolation, while at the same time solve the build/run-time ABI mismatch problem. (I think I also mentioned this idea somewhere else, but can’t find it now.)

There are more than one way to solve the build ABI issue, and introducing dependency overriding for it feels like falling into the XY problem trap to me. Dependency overriding is a much more general problem, and whether that should be possible (probably yes at some point, since pip is progressively making the resolver stricter, and people will need an escape hatch eventually) is an entirely other issue, and covered in other discussions.