UX for constraining build-time and runtime dependencies

rgommers commented 2 years ago

There is a practical UX problem with pyproject.toml that we need to solve. A short description would be:

Any project that builds against the numpy C API has to deal with API and ABI compatibility (see the numpy docs on that for details).
as a consequence, the dependency specification for local development is different from the one needed for uploading an sdist to PyPI:
- for local development, you want loose dependencies, e.g. numpy >= 1.19.5 (both a build and runtime dep)
- in sdist released to PyPI you want: oldest-supported-numpy (build dep metapackage, contains == pins per platform) and numpy>='buildtime_dep,<1.25.0' # should beN+3whereN is current minor version.

That is obviously a complex set of dependencies to express in pyproject.toml, and we can't have two sets in there at the same time. The solution @FFY00 and I just discussed is:

On the main branch, use the loose development dependencies (numpy>=1.19.5)
In the CI job(s) that produce artifacts for release to PyPI, ensure that the lowest supported dependencies (which may be platform-dependent) are installed.
In the release branch, replace the dependencies with the build-time deps needed for uploading to PyPI.
Have support in mesonpy for the various kinds of pinnings people need for releases and API/ABI compatibility.

A rough sketch of how (4) could look:

[build-system]
requires = [
  "numpy >= 1.19.5",  # replace with (e.g.) `oldest-supported-numpy` in a release branch
]

[tool.mesonpy]
runtime_pinnings = [
  # if build version detected is 1.20.3, replace with `>=1.20.3` (while preserving the upper bound,
  # so end result would be for example `'>=1.20.3,<1.25.0')
  "numpy = 'pin_compatible'",
  "pythran = 'bugfix_only'",  # if build version detected is 0.10.0, override the build dependency with `'>=0.10.0,<0.11.0'`
  "torch = 'exact'",  # runtime version must equal build-time version (PyTorch has no ABI stability guarantees)
]

Note that this feature is needed for some other packages than just NumPy too (basically every package offering a C/C++ API has to deal with this), but NumPy alone is important enough - every package containing even a single Cython extension which uses NumPy has to deal with this. However, it's probably still a bit too specific to want native support in pyproject.toml via a standard that all tools must implement. Hence the choice to put this in mesonpy.

FFY00 commented 2 years ago

I have let this sink in a bit more and thought about it.

The only slight worry I have with this approach, is that the wheels will not be necessarily reproducible. I don't think it's a big issue, as anyone serious about reproducible wheels should be reconstructing the original build environment anyway[^1], and this won't be in issue there. Still, I feel very slightly uneasy about the built artifacts being highly dependent on the environment ended up being resolved by Pep 517 builders. Just something worth nothing IMO.

Other thing, I think we should be adding a local tag to the version (eg. 1.2.3+numpy3.2.1) to easily identify the constraints from the build dependencies.

The main design decision here would be how to specify the locked version, I would really like to gather some feedback from other people here.

A similar problem, and its solution, can be found at https://www.python.org/dev/peps/pep-0440/#compatible-release.

My initial proposal would be:

...

[tool.mesonpy.pin-build-dependencies]
numpy = '*.*.*'

And the tricky part, the patterns:

1) Simple approach: Use a placeholder character, like * (eg. *.*.*), other option would be ^ (eg. ^.^.^).

2) Full-on regex: Allow users to specify regex patterns, which will be run against the version string of the dependency (eg. ^[0-9]*.[0-9]*.[0-9]).

All the examples would pin the first three version number (eg. for 1.2.3.4.5 we would lock 1.2.3).

Anyway, none of this is trivial to implement, and there isn't much difference in difficulty either, I think regex might be a bit easier? So, I think we should go with what's easier for users, unless that proves itself problematic when implementing.

That said, it is worth to point out that if we are constraining the dependencies on the fly, we need to put dependencies in dynamic and move it to the backend config. This is not optimal, but unfortunately PEP 621 does not let us keep the dependencies setting in the project table if we mark it as dynamic.

...

[project]
...
dynamic = [
  'dependencies',
]

[tool.mesonpy]
dependencies = [
  'numpy~=3.2.1',
]

[tool.mesonpy.pin-build-dependencies]
numpy = '*.*.*'

[^1]: And there has been a push from the community towards this, with PEP 665, though it has been rejected due to it not really targeting sdists. I think the proposal might revived sooner or later, hopefully with also sdists in mind. Anyway, I think this shows us that there are bigger needs here and that it is perfectly reasonable to not concern us too much with this issue from the build system side. For reference, other packaging ecosystems/systems, like Arch Linux will still be able to achieve full package reproducibility, so this really seems like a PyPI / PEP 517 issue.

rgommers commented 2 years ago

The only slight worry I have with this approach, is that the wheels will not be necessarily reproducible. I don't think it's a big issue, as anyone serious about reproducible wheels should be reconstructing the original build environment anyway1, and this won't be in issue there. Still, I feel very slightly uneasy about the built artifacts being highly dependent on the environment ended up being resolved by Pep 517 builders. Just something worth nothing IMO.

Agreed, need to have a reproducible environment for building. I think this topic is something we're going to have to teach people a bit about anyway (I've started some write-ups). It is absolutely untrue that PEP 517 will give you reproducible builds when you have compiled extensions. The illusion that they are reproducible is more harmful than helpful imho.

Other thing, I think we should be adding a local tag to the version (eg. 1.2.3+numpy3.2.1) to easily identify the constraints from the build dependencies.

Hmm, I'm not sure I like messing with version strings, that seems unjustified (and what do you get when you have say five such dependencies?). A project like SciPy will already have a complex version string - from the latest nightly: scipy-1.9.0.dev0+1215.a0e129f-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

My initial proposal would be:

Interesting - yes there's a lot of design flexibility here. I have no clear opinion yet. Note that conda-forge implements this in a reasonable form, looking at its pin_compatible and run_exports may be useful: https://conda-forge.org/docs/maintainer/pinning_deps.html. At first sight, seems easier to understand that regexes (and with more semantic meaning, e.g. ABI compatibility).

That said, it is worth to point out that if we are constraining the dependencies on the fly, we need to put dependencies in dynamic and move it to the backend config. This is not optimal, but unfortunately PEP 621 does not let us keep the dependencies setting in the project table if we mark it as dynamic.

Ah, good point. That may be a problem - aren't other tools supposed to be reading that dependencies table? I'm not sure what the implications would be exactly.

rgommers commented 2 years ago

A similar problem, and its solution, can be found at https://www.python.org/dev/peps/pep-0440/#compatible-release.

I looked at this again, and want to point out that ~= is quite a bit simpler than what we need. Using more operators doesn't seem ideal to me, that will be more or less incomprehensible to most devs without looking up the syntax.

Your example pyproject.toml snippet is the wrong way around (it pins runtime dependency and then finds a compatible build dependency). The logic we want instead is to specify the version of the build dependency, and then adjust runtime dependency dynamically (but not with ~= usually, we need to allow a larger range at runtime). I think this will be generically true for packages where ABI matters.

pradyunsg commented 2 years ago

(this is a drive-by comment, sorry!)

Regarding 4, a couple of ideas...

To go the route of mirroring the existing style from conda-build, where you are "calling" a function.

[tool.mesonpy]
runtime-pins-from-build-dependencies = [
  "compatible(numpy, *.*)",
  "bugfix_only(pythran)",
  "exact(torch)",
]

Alternatively, lean into structured TOML data:

[tool.mesonpy.runtime-pins-from-build-dependencies]
numpy = {strategy="compatible", constraint="*.*"}
torch = "exact"

That is obviously a complex set of dependencies to express in pyproject.toml, and we can't have two sets in there at the same time.

Well, depends on how/when you think those two should be composed. Basically, I'm imagining something like...

[tool.meson]
release-dependencies = [
   "oldest-supported-numpy",
]

(a) If it's OK to have an environment variable that is always set during release... That'll let you have dynamic dependencies via a get_requires_for_build_whatever that reads a list and tacks it on for release builds. When building the sdist, write the additional build dependency in build-system.requires, if the variable is set. (pick your opt-in/opt-out semantics)

(b) It's also reasonable to skip dynamic build dependencies and instead rely upon the build "flow" difference between pip install's consumer-facing behaviour of sources -> wheel vs build's [^1] publisher-facing sources -> sdist -> wheel. When building the sdist, write the additional build dependency in thee pyproject.toml build-system.requires always.

This, of course, means that you're assuming that all sdist builds are eventually-public and I reckon that's reasonable.

IIUC, between a strategy for pinning runtime numpy based on the build environment and, a way to ensure oldest-supported-numpy be added as a build dependency for release builds will be sufficient to get the behaviours you want?

FWIW, you could even squish all of this into a single table...

[tool.mesonpy.runtime-pins-from-build-dependencies]
numpy = {strategy="compatible", constraint="*.*", release-requires=["oldest-supported-numpy"]}
torch = "exact"

Other thing, I think we should be adding a local tag to the version (eg. 1.2.3+numpy3.2.1) to easily identify the constraints from the build dependencies.

What you're describing here feels to me like a poor man's substitute for an SBOM, for describing the build environment. There's no need to try and squish that into the local version label IMO.

It might reasonable to just dump the relevant build environment information into a file in the sdist/wheel, and use that... instead of trying to squish things into the local version label. And, if you want the protections that a local version label provides against a PyPI upload, put a sentinel value in there when you generate the non-release variants.

[^1]: A future pip build's semantics will match build's.

rgommers commented 2 years ago

Thanks for the input @pradyunsg!

Alternatively, lean into structured TOML data:

TIL there's such a thing. I don't have a real preference here - both syntax choices work, and are about equally readable. I'm not quite sure what constraint="*.*" means though - why do we need this? Or in other words, what can that be aside from *.*?

If it's OK to have an environment variable that is always set during release...

I'd prefer to avoid that, because it's going to be one of those things people forget whenever they build outside of the release CI jobs.

(b) It's also reasonable to skip dynamic build dependencies and instead rely upon the build "flow" difference between pip install's consumer-facing behaviour of sources -> wheel vs build's 1 publisher-facing sources -> sdist -> wheel. When building the sdist, write the additional build dependency in thee pyproject.toml build-system.requires always.

I like this option better. A little heretical for a PyPA/Pip dev to suggest it perhaps, but hey I'll take it:)

IIUC, between a strategy for pinning runtime numpy based on the build environment and, a way to ensure oldest-supported-numpy be added as a build dependency for release builds will be sufficient to get the behaviours you want?

Yes, I believe it is. That said, it's an additional improvement over having to make manual changes in release branches, and it's currently of secondary importance compared to pin_compatible. So I'd probably start with implementing just numpy = {strategy="compatible", constraint="*.*"}, and then leave release-requires= for after that.

It might reasonable to just dump the relevant build environment information into a file in the sdist/wheel,

We partly do that anyway, for example the detected BLAS/LAPACK version gets written to a generated .py file and exposed in numpy and scipy through a show_config() function. Extending that to dump the full configure output and environment information is quite useful - I was already planning to do something like that.

pradyunsg commented 2 years ago

I guess I used constraint instead of min_pin out of being uninformed. The latter is a better name anyway.

https://docs.conda.io/projects/conda-build/en/latest/resources/variants.html?highlight=pin_compatible#extra-jinja2-functions

rgommers commented 2 years ago

I guess I used constraint instead of min_pin out of being uninformed. The latter is a better name anyway.

Ah, this makes sense now, thanks. And it's another choice to make. We don't just need min_pin but also upper_bound (e.g., numpy should be pinned to < 1.most_recent_minor_version + 3). This could all be done in a single expression:

numpy = {strategy="compatible", min_pin=x.x.x, upper_bound=1.27.0}  # expressing it as `max_pin=x.x+3` would be great, conda-build doesn't allow this

or it could be done separately like I originally suggested:

[tool.mesonpy.runtime-pins-from-build-dependencies]
# note that we don't need `min_pin`, because >=1.19.5 is already specified in the build deps
numpy>='pin_compatible,<1.27.0'

I think in the end all of these are roughly equivalent, it's mostly a matter of preferred syntax (what's easiest to read & parse). I suspect structured TOML is best from a parsing perspective.

FFY00 commented 1 year ago

Okay, after spending quite some time with it, I think a viable option would be to follow the normal requirement specification format, but using the @ as a placeholder for elements we want to match. Additionally, we would also allow a TOML table with alternative matchers, which would provide a shortcut for common annoying matches, and allow use to implement an escape hatch for users if that is requested.

[tool.mesonpy.binary-runtime-constrains]
numpy = '>=@.@.@'  # for 1.20.3, it would add >=1.20.3 to the requirements
pythran = '~=@.@.@'  # also equivalent to ">=@.@.@,==@.@.*", for 0.10.0 it would add >=0.10.0,==0.10.*
torch = { strategy='exact' }  # this would match the exact same version

Can y'all try to break this approach?

dnicolodi commented 1 year ago

I've tried to understand what exactly is the problem we are trying to solve, I've read all the comments, but I'm not completely sure of the problem formulation. A way to derive the runtime dependencies from the version of the dependencies at the time of compilation is required. However, I'm not sure I understand all the details.

Let's assume we have a package A that exports some C API. Package A follows some rules for API and ABI compatibility. Let's say that minor versions (x.y.*) are ABI compatible and minor version (x.*) are API compatible. Let's assume that a package B links against the C API of package A. Package B needs to target a specific API of package A, which is defined by the major the major version numbers X.*. This can be expressed in the pyproject.toml like this:

[build-system]
requires = [
   'A ~= X'
]

If at the moment of compilation package A is available in version X.Y.Z, the resulting binaries of package B are compatible only with versions X.Y.* thus we would like to have the equivalent of

[project]
dependencies = [
   'A ~= X.Y'
]

encoded in the wheels. This seems reasonable. However, what if we want to release a version of B compatible with A version X.Y+1 ? We need to recompile it using this version of A. However, because the dependencies versions are not recorded in the wheel tags, the wheels for B compatible with A versions X.Y and X.Y+1 are identical.

The only way I see to solve this is to have specific version of B compatible with specific ABI compatible versions of A. This requires having the same version specification in build-system.requires and in project.dependencies. What am I missing?

FFY00 commented 1 year ago

This can be expressed in the pyproject.toml like this:
[build-system]
requires = [
   'A ~= X'
]

No, it'd have to be A~=X.Y, which would turn into A>=X.Y,==X.*. In this example, you probably just want A==X.*.

If at the moment of compilation package A is available in version X.Y.Z, the resulting binaries of package B are compatible only with versions X.Y.* thus we would like to have the equivalent of
[project]
dependencies = [
   'A ~= X.Y'
]

Again, it'd need to be A~=X.Y.Z, which would translate to A>=X.Y.Z,==X.Y.*, or you can just do A==X.Y.* directly, depending on what you actually want.

This seems reasonable. However, what if we want to release a version of B compatible with A version X.Y+1 ?

Yes, you'd need to make sure you build in an environment with A==X.Y+1.

We need to recompile it using this version of A. However, because the dependencies versions are not recorded in the wheel tags, the wheels for B compatible with A versions X.Y and X.Y+1 are identical.

The wheel names are identical if we don't add an extra identifier, but the wheels themselves would be different, and contain different metadata. Installers will refuse to install a A==X.Y.* in an environment with A=X.Y+1.

dnicolodi commented 1 year ago

No, it'd have to be A~=X.Y, which would turn into A>=X.Y,==X.*. In this example, you probably just want A==X.*.

Yes, thank you. The syntax of the ~= operator always confuses me.

The wheel names are identical if we don't add an extra identifier, but the wheels themselves would be different, and contain different metadata. Installers will refuse to install a A==X.Y.* in an environment with A=X.Y+1.

Sure, but what do you do with a wheel with the same file name bud different metadata? You cannot upload it to PyPI and I think any other wheel archive would reasonably do not allow to replace an existing version with a different binary. If you cannot redistribute the wheel, is this relevant only for local installations? Namely we need to make sure that when A version X.Y+1 is installed a compatible version of B is also installed.

FFY00 commented 1 year ago

You can add a local version identifier to differentiate it, that will give you a different wheel name, allowing you to upload it. Supporting uploading different wheels with the same name is also something PyPI could potentially support in the future.

PyPI does not support it yet, but PEP 658 provides an API for installers to look directly at the metadata before downloading the wheel. Right now, installers will just download wheels until they find a compatible one.

rgommers commented 1 year ago

Can y'all try to break this approach?

Quick first question: what happens with version specifiers like .dev0 and rc1 when you do something like '>=@.@.@'?

FFY00 commented 1 year ago

Quick first question: what happens with version specifiers like .dev0 and rc1 when you do something like '>=@.@.@'?

Nothing. >=@.@.@ for 1.2.3.4.5 translates to >=1.2.3, if you want to match those, you can, eg. >=@.@.@.dev@. I don't think you'd use this often, but it's there if you need.

I haven't decided what to do when you try to match one of these modifiers and one is not present, but my initial instinct would be to error out.

dnicolodi commented 1 year ago

Distributing wheel with the same version but different content is IMHO a very bad idea. You can add an identifier to the file name, but not the version stored in the wheel metadata. It seems an even worst idea to me. However, supporting locally build wheels is necessary.

What kind of operations on the version identifiers do we need to support? Which formats of version identifiers do we need to support?

I think we need to support all version identifier formats allowed by PEP 440 and we need to be able to replace the build version into an arbitrary string, with the possibility to drop version identifiers components at the end of the identifiers and replace them with *. These operations should be enough to construct arbitrary version requirements. Finding a convenient syntax to express these operations although seems complicated.

dnicolodi commented 1 year ago

Nothing. >=@.@.@ for 1.2.3.4.5 translates to >=1.2.3, if you want to match those, you can, eg. >=@.@.@.dev@. I don't think you'd use this often, but it's there if you need.

The problem wit this is that if you have version 1.2.3.dev0 installed and this translates into a runtime requirement >=1.2.3 you cannot install the obtained wheel, because 1.2.3.dev0 < 1.2.3. Isn't it?

FFY00 commented 1 year ago

Distributing wheel with the same version but different content is IMHO a very bad idea.

I disagree, there's a point you simply cannot provide all the compatibility details in the name. This is already true btw with pyhton-requires for eg.

You can add an identifier to the file name, but not the version stored in the wheel metadata.

You can add an identifier to the version, the local identifier field (eg. something in 1.2.3+something) exists for this exact purpose :sweat_smile:

What kind of operations on the version identifiers do we need to support? Which formats of version identifiers do we need to support?

I don't know. Something that's good enough for most people, and provide an escape hatch for the ones it isn't.

The problem wit this is that if you have version 1.2.3.dev0 installed and this translates into a runtime requirement >=1.2.3 you cannot install the obtained wheel, because 1.2.3.dev0 < 1.2.3. Isn't it?

Yes, and I think that's the main issue with this approach, but I think we'll have issues with these kinds of matching semantics with other approaches too.

There are a couple things we can do to fix this, I am struggling to decide what I think would be better, but perhaps we should just backfill the pre, post, and dev release fields based on the operator?

dnicolodi commented 1 year ago

You can add an identifier to the version, the local identifier field (eg. something in 1.2.3+something) exists for this exact purpose 😅

Sure, but if you are touching it up outside the build tool, then you can in the same way adjust the dependencies manually and we don't need to be providing infrastructure for this. And if you are updating the pyproject.toml to add the local identifier in the version field, you can again just update the dependencies manually.

Yes, and I think that's the main issue with this approach, but I think we'll have issues with these kinds of matching semantics with other approaches too.

The solution is to provide a bit more complex language. A first attempt could be something based on the Python string formatting language using the fields of the packaging.version.Version object:

[tool.meson-python.dynamic]
dependencies = [
  'numpy >= {v}',
  'foobar ~= {v.major}.{v.minor}',
  'insane == {v.major}.*.{v.micro}',
]

This is also easier to implement.

rgommers commented 1 year ago

Additionally, we would also allow a TOML table with alternative matchers, which would provide a shortcut for common annoying matches, and allow use to implement an escape hatch for users if that is requested.

You only used 'exact' as an example of a named option, so to make sure - did you think about including pin-compatible and bugfix-only? With your idea of it being split between @-style version specifiers and a table, what exactly would you do for my example from higher up:

numpy = {strategy="compatible", min_pin=x.x.x, upper_bound=1.27.0}

dnicolodi commented 1 year ago

what exactly would you do for my example from higher up:
numpy = {strategy="compatible", min_pin=x.x.x, upper_bound=1.27.0} 

I don't like using a dedicated DSL for this when we already have PEP 440. It is something more than the users need to learn and that we need to document. It is also ambiguous: what does x.x.x mean if the version of the package in the build environment has more or less than three components?

Maybe I'm missing something. Can't you example be expressed as

[tool.meson-python.dynamic]
dependencies = [
  'numpy >= {v}, < 1.27.0',
]

where we replace {v} with the version of the package in the build environment?

If we get fancy, we could even make this work:

[tool.meson-python.dynamic]
dependencies = [
  'numpy >= {v}, < {v.major}.{v.minor + 3}.0',
]

However, without knowing too much of the NumPy versioning scheme, this does not seem very useful to me.

pradyunsg commented 1 year ago

where we replace {v} with the version of the package

I think that is tractable, with v being a packaging.version.Version. It is more/as expressive than any alternative that we could have. It does mean that things are a little more exposed, in that we're giving the user direct templating rather than doing it for them.

The strategies approach has the benefit of being more constrained and thus easier to associate with specific styles of compatibility behaviours. It solves the "what exactly are we expecting expected compatibility to look like, for this project?"

pradyunsg commented 1 year ago

min_pin=x.x.x, upper_bound=1.27.0

FWIW, in TOML, all string values need to be quoted. So, both these values would be quoted.

rgommers commented 1 year ago

I don't like using a dedicated DSL for this when we already have PEP 440.

I wasn't arguing for that syntax (it's what conda-forge uses, and I really don't care about the exact syntax here). @FFY00 asked "Can y'all try to break this approach?" for the @.@.@ syntax and separate named things. I don't see how that gets combined.

Re PEP 440: I think it's good to keep in mind that we're designing something that is tool-specific at first (or, hopefully, combined with scikit-build-core) - but it's an extension to what PEP 440 offers, and ideally it could be standardized in the future.

'numpy >= {v}, < 1.27.0',

This should work I think. Two thoughts:

How to deal with pre-release versions and local version identifiers (+12345-git-hash and the like) is still important. Let's make sure that >= {v} (which to me is identical in what it expresses to a string 'pin-compatible') always works. I think that is indeed the case, because the +... part gets dropped in comparisons AFAIK.
I'm not sure {v.major}.{v.minor} reads or acts better than @.@. But that's only an aesthetic thing. Being able to say {v.minor + 3} is a useful extra ability.

dnicolodi commented 1 year ago

Re PEP 440: I think it's good to keep in mind that we're designing something that is tool-specific at first (or, hopefully, combined with scikit-build-core) - but it's an extension to what PEP 440 offers, and ideally it could be standardized in the future.

Sure. I was arguing that developers are already familiar with PEP 440. Deviating from that introduces more cognitive burden. As you can see from my comments above, I already get the syntax for the PEP 440 ~= operator wrong. Introducing another DSL is not going to make things simpler. Furthermore, the "simple templating" solution seems more flexible to me, allowing to easily code things like

[tool.meson-python.dynamic]
dependencies = [
  'numpy >= {v}, < 1.27.0, != 1.26.99',
]

How to deal with pre-release versions and local version identifiers (+12345-git-hash and the like) is still important. Let's make sure that >= {v} (which to me is identical in what it expresses to a string 'pin-compatible') always works. I think that is indeed the case, because the +... part gets dropped in comparisons AFAIK.

This needs to be though through better, probably, but I imagine that {v} is the exact version of the package installed at build time, complete with local version identifiers and pre- and pos- release version identifiers. Users would need to be aware that local version strings are handled in a surprising way: they sort in lexicographical order:

>>> packaging.version.Version('1.2.3+b') > packaging.version.Version('1.2.3+a')
True

If you want to drop local version identifiers, you can use {v.public}, or {v.base_version} to also drop the pre- and post- release version identifiers.

I'm not sure {v.major}.{v.minor} reads or acts better than @.@. But that's only an aesthetic thing. Being able to say {v.minor + 3} is a useful extra ability.

The {v.major}.{v.minor} syntax has the advantage of being explicit and of being extremely easy to test on the Python REPL simply importing packaging.version and is IMO much more expressive, see the examples above.

dnicolodi commented 1 year ago

Being able to say {v.minor + 3} is a useful extra ability.

This would require to write a parser for the expressions in meson-python (not a huge deal if we restrict the possible operations to + and -, but maybe still some code that we don't want to maintain) or would require to use Python's eval(). I would maybe leave this feature out at the beginning.

dnicolodi commented 1 year ago

I started an implementation of this idea, see #319, and I realized that there is an important aspect that I was not thinking about: which dependency requirements should be included in the sdist metadata? I think the sdist metadata should use the same version requirement that are specified in the build-system.requires field in pyproject.toml. Is this correct?

FFY00 commented 1 year ago

The idea would be for this to provide extra dependency constrains, not to replace project.dependencies. build-system.requires is separate.

dnicolodi commented 1 year ago

The idea would be for this to provide extra dependency constrains, not to replace project.dependencies

I don't understand how that would work. Doesn't project.dependencies need to do not be specified if it is declared to be dynamic? At least this is what PEP 621 says https://peps.python.org/pep-0621/#dynamic:

Build back-ends MUST raise an error if the metadata specifies a field statically as well as being listed in dynamic.

If the metadata does not list a field in dynamic, then a build back-end CANNOT fill in the requisite metadata on behalf of the user (i.e. dynamic is the only way to allow a tool to fill in metadata and the user must opt into the filling in).

FFY00 commented 1 year ago

Yes, we'd have to give it a different key, but keeping the same semantics. sdists are exactly the reason why. Build and sdist dependencies are different, while we could try to be smart and fill sdist dependencies from the build requirements, IMO it'd be best to keep them separate.

rgommers commented 1 year ago

From higher up:

That said, it is worth to point out that if we are constraining the dependencies on the fly, we need to put dependencies in dynamic and move it to the backend config. This is not optimal, but unfortunately PEP 621 does not let us keep the dependencies setting in the project table if we mark it as dynamic.

Ah, good point. That may be a problem - aren't other tools supposed to be reading that dependencies table? I'm not sure what the implications would be exactly.

dnicolodi commented 1 year ago

I think I understand. We need something like:

[build-system]
build-backend = 'mesonpy'
requires = [
   'meson',
   'meson-python',
   'numpy >= 100.0.0',
]
[project]
name = 'name'
version = '1.0.0'
dynamic = [
  'dependencies',
]
[tool.meson-python.dynamic]
dependencies = [  # stuff that goes into the sdist dependencies
   'numpy >= 100.0.0',
   'foobar',
]
wheel-dependencies-pins = [ # additional constraints for wheels
   'numpy >= {v}, < 101.0.0',
]

To generate the wheel dependencies we can just concatenate the requirements specifications in tool.meson-python.dynamic.wheel-dependencies-pins to the tool.meson-python.dynamic.dependencies requirements.

FFY00 commented 1 year ago

Yes.

dnicolodi commented 1 year ago

Nice. I think the hardest part is giving the pyproject.toml fields used by this names that make clear enough what is going on. I'm not that happy with what has been proposed till now, and what I came up with in the example above is not better. Maybe something like:

[tool.meson-python]
sdist-dependencies = [  # stuff that goes into the sdist dependencies
   'numpy >= 100.0.0',
   'foobar',
]
wheel-dependencies-pins = [ # additional constraints for wheels
   'numpy >= {v}, < 101.0.0',
]

?

Concatenating the specifications can really be just string concatenation as long as the user does not specify something that results in contradicting requirements. I don't think that this would be an easy thing to check for. For example, the specification above would become the equivalent of

dependencies = [
   'numpy >= 100.0.0, >= 100.1.2, < 101.0.0',
   'foobar',
]

assuming the numpy version found at wheel build time i 100.1.2. This should work just fine. packaging.requirements parser has all the pieces required to implement this quite easily.

pradyunsg commented 1 year ago

FWIW, I still like the name runtime-pins-from-build-dependencies for the additional constraints. :)

packaging.requirements parser has all the pieces required to implement this quite easily.

Well, there's on caveat with requirment merging: merging markers. As long as you forbid that, it should be OK.

dnicolodi commented 1 year ago

FWIW, I still like the name runtime-pins-from-build-dependencies for the additional constraints. :)

I read that as "take the build dependencies (which I interpret as the content of build-system.requires) and enforce them a run time". However, we are taking the version of the packages installed at the time the wheel is built and add them to the wheel dependencies requirements.

Maybe wheel-requiremets-from-build-versions?

Well, there's on caveat with requirment merging: merging markers. As long as you forbid that, it should be OK.

Indeed. I didn't think about this.

dnicolodi commented 1 year ago

We don't just need min_pin but also upper_bound (e.g., numpy should be pinned to < 1.most_recent_minor_version + 3).

I don't know the API and ABI stability guarantees of the NumPy releases, but I'm not sure I understand how automatically bounding the compatible releases to >= x.N, < x.N+3 for all values of N could be useful. Doing so would mean that releases x.N, x.N+1, x.N+2 are compatible (same ABI, only additions to the API). However, once the current version of NumPy is N+1, and we apply the same bounds, we obtain than releases x.N+1, x.N+2, x.N+3 are compatible. By induction we can demonstrate that all x.y releases need to be compatible to each other for any value of y (subsequent releases can only add to the API). What am I missing?

rgommers commented 1 year ago

I don't know the API and ABI stability guarantees of the NumPy releases, but I'm not sure I understand how automatically bounding the compatible releases to >= x.N, < x.N+3 for all values of N could be useful.

Note N == most_recent_minor_version. This is time-based. If the latest version now is 1.24.1, then you put the upper bound at <1.27.0. Basic rationale: deprecation cycle is 2 releases, so things that get deprecated in 1.25 can be ripped out in 1.27, so avoid that breakage.

dnicolodi commented 1 year ago

Thus this is part of the dependency constrains for the sdist, not for the wheels, and it is set at release time, not a compile time. Right? Therefore the ability to use expressions to define the build time pins is not necessary.

rgommers commented 1 year ago

The upper bound is set at release time, the lower bound is set at build time. The latter only depends on the numpy version installed in the build environment.

dnicolodi commented 1 year ago

Understood. I think there are only two types of build time pins that will be used in practice: exact version (==) and lower version (>=) bound. It is probably good to design the facility with more flexibility, but I don't think adding complexity to support more elaborate schemes is required at this stage.

rgommers commented 1 year ago

Agreed, those two are the most obvious ones. Maybe also the bugfix-only one, that's reasonably common. I can't think of other ones that I've encountered recently in the wild.

FFY00 commented 1 year ago

Okay, so here's my take on this.

First, I think having some kind of strategy syntax (see exact, pin_compatible, bugfix_only, etc. from above) makes sense. This is pretty future-proof, as it allows us to add new "strategies" in the future to meet future needs or fix issues we may identify with the approach we choose right now. It's also pretty extensible, as it'd allow us to let users define custom strategies -- acting as an escape hatch for situations where the "strategies" we provide do not meet the needs of the user.

That said, I think only having very involved strategies, like pin_compatible and bugfix_only, could be a bit of a pain, as I think we'd need to add lots of different strategies to make sure we meet the needs of most users. Other issue I have with this is that it'd have to lean a fair bit on the semantics, by that I mean that users would need to know exactly how the strategy they are using works, which I think can cause some issues with users making bad assumptions. Additionally, I think coming up with an ergonomic syntax for this would be a bit difficult.

For that reason, I think we should have a more broad "strategy", match. This "strategy" would allow users to specify a standard requirement string with some placeholders, that would be filled by the package version present during the build.

Now, the matching syntax (@.@.@, {v.major}.{v.minor}.{v.patch}, etc.) is not really very problematic, defining the scope and behavior for the matching is. dev and pre versions make this very tricky. I spent quite a bit of time thinking about which constrains you'd actually generally want on dev and pre version, and then trying to come up with a way to translate a requirement pattern provided by the user to a requirement string that would work well for both normal and dev/pre versions. Essentially, my takeaway was that we should not let the user match the dev and pre sections of versions, and instead, whenever a constraint is specified for a package and the version if that package is either dev or pre, we should just pin the version (eg. if a user specifies the ~= @.@.@ pattern for Numpy and Numpy is at version 1.27.0.dev0, we'd add numpy == 1.27.0.dev0 as a requirement). I think this is the sanest default behavior, but if a user wants a different behavior, they can use the escape hatch.

That said, for the matching syntax, I think simply using @ as a placeholder is quick, intuitive, and simple enough to implement. The {v.major}.{v.minor}.{v.patch} proposal as-is would allow to users to not use placeholders for parts version (eg. 1.2.@). That, of course, can be prevented, but I think the code to do that would be more complex, or at least as complex, as the @ placeholder one. IMO, considering that and the other aspects of both proposals, @ seems nice, but I am naturally biased.

So, here's my proposal...

[tool.mesonpy.binary-runtime-constrains]
packageA = 'match(~= @.@.@)'
packageB = 'exact'
packageC = 'custom:identifier(args)'

Strategies:

exact
- Pins the version
match
- Performs the version matching, as described above
custom:{identifier}
- Runs a user-provided matcher
- We could load it from a .meson-python.py file for eg.
- If we standardize (write a PEP) for this format, we could also allow matches to be registered via entry-points, allowing complex projects like Numpy to provide their custom matcher (eg. custom:numpy could be provided to match the ABI)
  - Also, we could allow projects to export a default matching rule in their metadata, making unnecessary for to specify that themselves

Examples:

1.2.3 on == @.@.@ > == 1.2.3
1.2.3.4.5 on == @.@.@ > == 1.2.3
1.2.3.post0 on >= @.@.@ > >= 1.2.3
1.2.3.dev0 on == @.@.@ > == 1.2.3.dev0
1.2.3.dev0 on == @.@.* > == 1.2.3.dev0
1.2.3.dev0 on >= @.@.@ > == 1.2.3.dev0
1.2.3.dev0 on >= @ > == 1.2.3.dev0
1.2 on == @.@.@ > matching error

It's a bit rough, but here's the document I wrote when researching this issue. It contains a quick summary of the version and requirement strings syntax, and then several of the approaches I considered with some notes.

https://gist.github.com/FFY00/dd56cde19e68a9138219ecf44bcb879d

rgommers commented 1 year ago

Thanks for the extensive write-up @FFY00 - very useful.

Now, the matching syntax (@.@.@, {v.major}.{v.minor}.{v.patch}, etc.) is not really very problematic, defining the scope and behavior for the matching is. dev and pre versions make this very tricky.

Agreed, I think this is turning out to be a key issue.

Essentially, my takeaway was that we should not let the user match the dev and pre sections of versions,

+1 to that, it's too granular and complex. These types of non-release versions can appear as installed build dependencies or as runtime dependencies, but it should not be needed (or very rare at a minimum) to explicitly include them in a dependency specification.

and instead, whenever a constraint is specified for a package and the version if that package is either dev or pre, we should just pin the version (eg. if a user specifies the ~= @.@.@ pattern for Numpy and Numpy is at version 1.27.0.dev0, we'd add numpy == 1.27.0.dev0 as a requirement). I think this is the sanest default behavior, but if a user wants a different behavior, they can use the escape hatch.

This is almost, but not quite, right imho. It's common to want to build with some version, and then test against another version (many testing scenarios, but a concrete one is bisecting a regression for example).

It's just as easy to say >= than == here, and that seems strictly more useful without introducing any additional complexity.

So, here's my proposal...

Overall looks good, but two questions on the strategies:

In terms of supported features, why leave out bugfix-only?
Regarding implementation: can the user-provided matcher please be done later if it's not nontrivial? I have been needing the strategy for numpy for quite a while now, and the metadata in scipy, scikit-image etc. wheels is wrong right now because we don't have it.

And just to make sure I understood it well, here is what I think I'd write for scipy depending on numpy:

numpy = 'match(>= @.@.@), <1.27.0'

right?

rgommers commented 1 year ago

It's just as easy to say >= than == here, and that seems strictly more useful without introducing any additional complexity.

@FFY00 and I had a chat about this just now, tl;dr:

@FFY00's main objection is correctness (a wheel compiled against a .dev version may be using non-stable things that get changed later in the release cycle). Hence the preference for ==; users who want to test against different versions may or must use something like pip's --ignore-installed.
my counter-argument is that (a) this correctness thing isn't a real concern in practice in my experience, and (b) this is already the state of things today, because there's nothing that does the == pinning for dev versions today. And --ignore-installed will be very cumbersome, it seems like gratuitous workflow breakage
Middle ground: @FFY00 suggested to just skip the constraint completely then, in the case of dev/pre-release versions. To me this is certainly better than a hard pin, so I can live with that.

dnicolodi commented 1 year ago

I think the direction of the discussion is going toward a solution that is at the same time over-engineered and extremely constrained. There are an handful of packages that will make this functionality useful for packages depending on them. Because the build time pin strategy to employ depends on the compatibility guarantees of the packages depended upon, I expect these packages to provide the recipe to put into pyproject.toml in their documentation. Thus we are looking at designing something that will need to be understood in detail only by a selected set of developers. The syntax therefore does not need to be foolproof, but needs to be expressive. If it is trivial to implement, even better. We already have a standard for version specifiers: PEP 440, and we have an implementation of the standard: the packaging.requrement module, which is already a dependency of meson-python. I think using the PEP 440 syntax and the packaging.requirement and packaging.version modules is the most straightforward, the most flexible, and easiest to understand thing to do.

Is there any project for which what is allowed in my proposal above (now implemented in #319) would not work?

I think that build-time pins in practice can have only two forms: pin to the same version (for some definition of "same version") used for building, or pin to the same version any later version (within the limits prescribed by the sdist dependencies and the build-requires dependencies). Therefore, we just need two operators: == and >=.

What "same version" means depends on the ABI stability promise of individual projects. I can think of only two projects for which this mechanism is going to be used: NumPy and SciPy. What are the policies for these? Can we have concrete and precise examples of the pins these projects would like to put in place?

dnicolodi commented 1 year ago

2. Regarding implementation: can the user-provided matcher please be done later if it's not nontrivial? I have been needing the strategy for `numpy` for quite a while now, and the metadata in `scipy`, `scikit-image` etc. wheels is wrong right now because we don't have it.

I've read the last comments a couple of times, but I haven't understood what the "user-provided matcher" is supposed to do nor what is exactly the use case. Can you provide more details and an example?

FFY00 commented 1 year ago

@FFY00's main objection is correctness

Maybe I didn't present this very well, but my main objection is that dev and pre releases are tricky. My proposal to solve this was to just pin them in those cases, which you were unhappy about for the reasons mentioned in your reply, so the middle ground was to just not generate constrains for those packages.

I've read the last comments a couple of times, but I haven't understood what the "user-provided matcher" is supposed to do nor what is exactly the use case. Can you provide more details and an example?

This would be an escape hatch mechanism, to allow users to provide custom matching logic, for cases where our syntax is not enough. The idea would be that they can write a Python function that implements their custom matching logic, but that's not very important right now, so let's keep it simple and

I think the direction of the discussion is going toward a solution that is at the same time over-engineered and extremely constrained.

Perhaps, I mean, I think it's at least an okay of compromise, but naturally I may be missing something.

Considering only the design, I think there are two main key points regarding your proposal, 1) what happens on dev and pre-release versions, and 2) the syntax. I think 2) is dependent on 1), so let's just keep things simple and just look at 1).

Personally, I suspect you just might not have an enough understanding of all the quirks of PEP 440 versions and requirement strings, which I don't mean as a stab at you, but rather as a statement of how complex they are. So, let's go through the paces and look at some examples.

Is there any project for which what is allowed in my proposal above (now implemented in #319) would not work?

Yes, let's take numpy in SciPy for example. Releases seemingly use a three part release section, and we need to match the first ~~two~~ three. For example, given the 1.24.2 version, we want to generate a >=1.24.2 requirement. Let's give that a try.

First, I'll just create a simple helper that uses packaging.version.Version to render the version pin template, and does a sanity check to make sure the version we give it is actually compatible with the requirement we generated.

>>> from packaging.version import Version
>>> from packaging.specifiers import Specifier
>>> def match(version, template):
...     v = Version(version)
...     s = Specifier(template.format(v=v))
...     print(f'Does `{v}` match `{s}` (generated from `{template}`)? {v in s}')
...

Now, let's give it a try.

>>> match('1.24.2', '>={v.major}.{v.minor}.{v.micro}')
Does `1.24.2` match `>=1.24.2` (generated from `>={v.major}.{v.minor}.{v.micro}`)? True

Great! It's what we wanted.

Now let's say that the first pre-release for 1.25.0 is released, and we want to try it out to see if our project breaks.

>>> match('1.25.0rc1', '>={v.major}.{v.minor}.{v.micro}')
Does `1.25.0rc1` match `>=1.25.0` (generated from `>={v.major}.{v.minor}.{v.micro}`)? False

Oh, right, if we actually want to match the pre-release, we need to include it.

>>> match('1.25.0rc1', '>={v.major}.{v.minor}.{v.micro}{v.pre[0]}{v.pre[1]}')
Does `1.25.0rc1` match `>=1.25.0rc1` (generated from `>={v.major}.{v.minor}.{v.micro}{v.pre[0]}{v.pre[1]}`)? True

Great, but now matching against a normal version doesn't work :frowning_face:

>>> match('1.25.0', '>={v.major}.{v.minor}.{v.micro}{v.pre[0]}{v.pre[1]}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in match
TypeError: 'NoneType' object is not subscriptable

How do we fix this?

dnicolodi commented 1 year ago

I don't understand why you need to use the version specifiers parts. The issue goes away if you just use the simplest form: {v}

>>> match('1.25.0rc1', '>={v}')
Does `1.25.0rc1` match `>=1.25.0rc1` (generated from `>={v}`)? True
>>> match('1.24.2', '>={v}')
Does `1.25.0` match `>=1.25.0` (generated from `>={v}`)? True

What am I missing? If the goal is to strip local version identifiers, you can use {v.public}:

>>> match('1.25.0rc1', '>={v.public}')
Does `1.25.0rc1` match `>=1.25.0rc1` (generated from `>={v.public}`)? True
>>> match('1.24.2', '>={v.public}')
Does `1.25.0` match `>=1.25.0` (generated from `>={v.public}`)? True
>>> match('1.25.0+foo', '>={v.public}')
Does `1.25.0+foo` match `>=1.25.0` (generated from `>={v.public}`)? True

If there are some version identifier transformations that are not easily accomplished combining packaging.version.Version attributes, the easiest is to subclass it and add what is needed in form of properties.

rgommers commented 1 year ago

I don't understand why you need to use the version specifiers parts. The issue goes away if you just use the simplest form: {v}

I think the issue is that as soon as you use more than plain {v}, you get a problem with dev/pre versions. How would you express the bugfix-only constraint (~=@.@.@) in a way that works for dev/pre too?

FFY00 commented 1 year ago

Yes, and realistically in our numpy example, it does work, but that syntax is not really what we wanted. We wanted to match the three first release parts, but using {v}, we match the full version. This, like I said, is not really an issue for numpy because they have never released a version with 4 release parts (1.24.2.1), and it's unlikely that they will.

Let's look now at pythran, where versions seemingly have a three part release section, but we only want to match the first two. The same problem from the previous example arises, but using {v} here is not an option.

>>> match('0.10.0', '>={v.major}.{v.minor}')
Does `0.10.0` match `>=0.10` (generated from `>={v.major}.{v.minor}`)? True
>>> match('0.10.0rc1', '>={v.major}.{v.minor}')
Does `0.10.0rc1` match `>=0.10` (generated from `>={v.major}.{v.minor}`)? False

If we want to fix this, we need to significantly extend, perhaps even redesign, the packaging.version.Version API, and this is only one of multiple issues that you'd need to consider when doing so.

On top of all this, we introduce more complexity that the user needs to be aware (eg. in your example, the user must know that >={v.major}.{v.minor}.{v.micro} will generate a constraint that will not work as they could have reasonably expected, and that they must use >={v}).

mesonbuild / meson-python

UX for constraining build-time and runtime dependencies #29