Open bastienqb opened 1 year ago
As much as I dislike unnecessary name mangling, this report does appear to be correct. I'm unsure if Setuptools is responsible for the naming or if the wheel package is. Regardless, it probably should be fixed.
@dholth is there any chance the PEP could be updated to allow .
characters in the wheel name? What was the motivation for mangling them? They're an intrinsic, important character in the Python package name.
@jaraco I belive that the PEP originally allowed for .
, but the living spec was changed as a result or a discussion:
https://github.com/pypa/packaging.python.org/pull/844
(Although after a quick look on the Discourse thread, it looks to me that the .
character, specifically, was not really debated and ended up accidentally changing)
The current spec seems to say that full name normalization should happen (i.e. lower case + runs of special chars to underscore), and from my quick test "newer" backends all follow that.
From distribution viewpoint, we'd also prefer setuptools following, as otherwise we end up with unpredictable filenames (with some backends producing normalized names with others not).
I'm unsure if Setuptools is responsible for the naming or if the wheel package is.
Apparently wheel is, with wheel_dist_name()
function in bdist_wheel.py
. There's https://github.com/pypa/wheel/issues/440 which seems to tackle this, though the bug title talks of .dist-info naming.
Hi @jaraco, there was a discussion recently on the Python discourse about the normalisation of the distribution file names https://discuss.python.org/t/change-in-pypi-upload-behavior-intentional-accidental-pebkac/27707. I will try to summarise the key takeaways I found on why most of the community seems to be in favour of the normalisation. Hopefully this answers the question "What was the motivation for mangling them?":
pip
, whose primary use case is to download from PyPI, prefers to rule out the possibility of treating distributions named after namespace packages and “normal packages” as two different distributions. This is compatible with PyPI and also helps users to fix unintentional typing errors and avoid downloading wrong/malicious distributions..dist-info
/.egg-info
directory (faster lookup), and if I understood correct this would also help to optimise the checks for conflicting distributions already installed (since .dist-info
serves as a database).a.b
and a_b
to coexist in the same private index.Name
in the PKG-INFO
/METADATA
files should not be normalised and reflect the user's input.So it seems that the name change unlocks optimisations and simplifications.
@bastienqb, if you would like you can chip in the discussion on https://discuss.python.org/t/change-in-pypi-upload-behavior-intentional-accidental-pebkac/27707 to explain why keeping the names in the format {namespace}.{package-name}
is important. Otherwise there seems to be a push in the community for a strict standard that normalises the file name (as a mean to unlock the optimisations and simplifications I mentioned before).
Unfortunately, I don't think that answers my question - "why is .
normalized to _
?". They're very clearly different separators and have very different semantic meaning in Python. That is, if a Python user can't tell the difference between those characters, they're already headed for disaster.
Moreover, if the goal is to collapse any characters that a user might find confusing, it suggests that other normalization should occur. By this logic, PyPI should probably also normalize "I" and "l", maybe "j" and "i", "3" and "e", and probably others.
Since there's a strong push toward PyPI names being valid Python identifiers and since "jaraco.collections" and "jaraco_collections" are very much different Python identifiers, I feel strongly that either or both names should be allowed and should be different packages.
I'm very much in support of normalizing for security and to limit the diversity of the namespace and to do that in a way that's largely transparent to the user. What I'd really like to avoid is users seeing "downloading zope_interface" when the package they're downloading is "zope.interface" and the Python package that's installed is zope.interface
.
The most important factor here is not to give namespace packages a second-class experience, and that's exactly what they'll get if they follow the convention of naming the package by mapping the Python package to the Distribution package name and the .
gets replaced by _
in user-visible locations.
There's some confusion happening here.
Regardless of what happens, PyPI (and everyone else) is going to treat .
, -
, and _
as equal characters. This behavior has existed since basically the dawn of time in PyPI, setuptools, pip, etc. This isn't any different than the fact we treat F
as equal to f
. This is the status quo for ~20 years, and isn't likely going to change.
There's some confusion that came out of some of the specs where the Name
field inside of the METADATA
some people interpreted that to saying that the Name
field should be normalized. I don't believe that there is wide spread support for that, and PyPI does not require that, and I think the people who think that, have essentially just misread the specs, and I'm preparing a PEP that will clarify that the Name
field (and thus ultimately the "canonical" name, which should be used in any user visible locations. So when someone looks at the project on PyPI, or whatever it should use the name as it exists in the Name
field.
On PyPI we normalize the name in the Simple API URLs only. So for zope.interface
the simple API URL is /simple/zope-interface/
. We do not consider this a user visible location, it's part of the API contract between an installer and PyPI. From a practical standpoint pip has to be able to take a user entered name and get the URL, and if we didn't do this normalization in the url, then pip install django
would fail (because it's Django not django), etc.
The question is largely around filenames. Does zope.interface
need to produce a wheel named zope.interface-1.0.whatever.whl
, or can it produce a wheel named zope_interface-1.0.whatever.whl
. Noting of course that no matter what we choose, a package named foo-bar
is never going to have it's name represented exactly perfectly in the wheel.
The specs as they're currently written decide that the filename is not a user facing value, and treats them much like the URLs in the Simple API
, an interchange format between computer systems. Of course filenames are also a little more visible than Simple API URLs, they do appear (as filenames) in the PyPI UI, etc.
So ultimately the question is:
zope.interface
and zope-interface
and zope_interface
are all the same name as far as packaging is concerned.zope.interface
./simple/zope-interface/
.Is it OK for the filenames to be:
zope_interface-6.0.tar.gz
zope_interface-6.0-cp311-cp311-win_amd64.whl
zope_interface-6.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
or MUST it be:
zope.interface-6.0.tar.gz
zope.interface-6.0-cp311-cp311-win_amd64.whl
zope.interface-6.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Unfortunately, I don't think that answers my question - "why is
.
normalized to_
?".
@abravalheri shared your question here on the linked thread, and ended up being convinced by the chorus of responses, so I'll try to summarize the main reasons other PyPA maintainers cited:
_
)flit-core
, hatchling
, meson-python
, pdm-backend
, poetry-core
) already do, rather than inventing a whole new scheme, changing the standard yet again and expecting all the other backends to switchHowever, there was equally strong support for only applying normalization to the identifiers that are not primarily user-facing, i.e. the artifact filenames and the .dist-info
, and mandating that the METADATA
Name
field not be normalized, and that tools should always use that value whenever presenting the project name in a user-facing context (or if they do happen to rely on the distinction). This seems to address your main overriding concern—that the project name be presented to the user as the author intended.
Therefore, it seems a PEP formally declaring that Name
MUST NOT be normalized and SHOULD always be what is presented to the user, while also stating the that it MUST be normalized in new archive filenames and .dist-info
, would come closest to giving everyone most of what they want here without regressing on the de-facto status quo for either, which as Dustin summarizes on the thread is a mess for everyone involved—especially maintainers with .
in their project names, which was actually what kick-started that discussion in the first place.
I concede. It doesn't matter what the motivation was to consider .
and _
equivalent, but they are now by consensus.
Just to be clear, this was only the case for sdist and wheel filenames, dist-info directories and when requesting a package by name from an index—there was also strong consensus that they should not be considered equivalent in the canonical project name, the Name
field of pyproject.toml
, PKG-INFO
and METADATA
, and the display name for user consumption, and that should be kept exactly as originally written by the project author.
The current behavior of keeping dots in wheel filenames is causing problems for uv, which assumes spec-normalized filenames (https://github.com/astral-sh/uv/issues/8030).
@konstin, if you would like to open a PR that would be welcome.
setuptools version
setuptools==66.0.0
Python version
python 3.8
OS
macOS
Additional environment information
No response
Description
I am building a wheel for a python package using setuptools and it seems that the naming of my wheel file is not respecting the PEP 491 convention.
For external reasons, I need to name my package with the structure
{namespace}.{package-name}
. If I follow the convention, I would expect that my wheel file is named:namespace_package_name-0.1.0-py2.py3-none-any.whl
.However, I get this name for my wheel:
namespace.package_name-0.1.0-py2.py3-none-any.whl
, which is not respecting the convention.Expected behavior
I expect the "." in the package name to be replaced with a "_" in the wheel name.
How to Reproduce
in
setup.cfg
, write:project.toml
, write:pipx run build
at the root of your hello_world packagedist
directory which was createdOutput
in the dist folder, you find:
namespace.my_package-0.1.0-py3-none-any.whl