pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.54k stars 952 forks source link

No Constraint on Version Names Can Cause Issues #12483

Open RobertRosca opened 1 year ago

RobertRosca commented 1 year ago

Describe the bug

There's no (or a very high) limit on the name provided for a version of a package, for example this package https://pypi.org/project/uselesscapitalquiz/ has a version name which is 218 characters long.

Depending on the OS and file system you can hit file name length limits, causing issues with mirroring PyPI or with installation. See https://github.com/pypa/bandersnatch/issues/1200, https://github.com/pypa/bandersnatch/issues/1228

Expected behavior

There should be a limit in place on the length of the version name to avoid this from happening, either on accident as it seems to be with uselesscapitalquiz or on purpose to cause issues on users systems.

To Reproduce

N/A

My Platform

N/A

Additional context

I'm happy to work on a PR limiting the length of the version name, if that's an approved solution.

ewdurbin commented 1 year ago

PEP 440 does not specify a length constraint for version identifiers, which I think would need to happen before PyPI enforced such a limit. A practical limit is a good idea though.

RobertRosca commented 1 year ago

Yeah, I thought that might be the case. The section "Updating the version specification" says:

The versioning specification may be updated with clarifications without requiring a new PEP or a change to the metadata version.

Any technical changes that impact the version identification and comparison syntax and semantics would require an updated versioning scheme to be defined in a new PEP.

IMO this kind of limit wouldn't have any practical impact on existing projects, and there's no reasonable use case for having a version number in the hundreds of characters which would fit into the existing PEP 440 specification. So it's tempting to try and say that a character limit is a clarification rather than a change worthy of a whole new PEP.

I'll wait until next week to see if anybody else chimes in on the issue here and, if there are no objections, make an issue to 'clarify' PEP 440 and add in a length limit to the version number.

Out of curiosity I dug into this a bit, with google big query, for all packages in the-psf.pypi.distribution_metadata the summary is:

count    7.902727e+06
mean     6.598360e+00
std      3.552846e+00
min      1.000000e+00
25%      5.000000e+00
50%      5.000000e+00
75%      7.000000e+00
99%      2.200000e+01
max      2.350000e+02

Out of 7,902,727 published package versions there are:

It's kind of surprising to me that hundreds of releases have such long versions :confused: either way, overall 99.991% of versions have less than or equal to 32 characters.

There's actually a discussion about this on semver https://github.com/semver/semver/issues/304 but I don't think any limit was set in the specification, although practically there is a limit as major/minor/patch get parsed as integers and JS' max safe integer is 9007199254740991, which in total means the max string length is 50 characters for node js.

dstufft commented 1 year ago

I think it would be reasonable for us to provide limits outside of what PEP 440 has, and in theory we technically already do since I believe you couldn't use a version number that expanded to be 100G worth of characters. It would probably be worth at least a discussion on discuss.python.org though, and digging into the releases that have longer version numbers and seeing what exactly their version numbers are and whether there is some use case we're missing.

I would also note that the same thing can happen with the project name, and also with compressed tags in a wheel filename that lead to very long filenames.

andyhasit commented 11 months ago

Long paths cause problems with other parts of the ecosystem, such as poetry, which caches wheels in directories with already long names like /home/andrew/.cache/pypoetry/artifacts/07/ef/d7/f4e72ab224633e85fd96dd6c096d8c35b025ecaa3c6d7728b6d271f83b/

Resulting in errors like this:

 [Errno 36] File name too long: '/home/andrew/.cache/pypoetry/artifacts/07/ef/d7/f4e72ab224633e85fd96dd6c096d8c35b025ecaa3c6d7728b6d271f83b/SQLAlchemy-1.4.49-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl'

Which they feel is not exactly up to them to fix. Of course they could shorten part of the cache path, but then there are wheels with names over 200 chars long, like:

rgf_python-3.6.0-py2.py3-none-macosx_10_6_x86_64.macosx_10_7_x86_64.macosx_10_8_x86_64.macosx_10_9_x86_64.macosx_10_10_x86_64.macosx_10_11_x86_64.macosx_10_12_x86_64.macosx_10_13_x86_64.macosx_10_14_x86_64.whl

Which will break things regardless.

See https://github.com/python-poetry/poetry/issues/8529