Open osamuaoki opened 9 months ago
~Hi @osamuaoki , have you tried setting the readme.content-type
to text/plain
?~
Sorry, wrong hint: this does not seem to solve the problem...
It is probably a "folding" issue with the syntax of the METADATA file. The effect of license.file
is to embedded all the text from the file into METADATA
. Omitting it relies on Setuptools using the standard rules to add the license file into the wheel as a separated file.
Yah, I was thinking along the same line. Despite the upload ERROR message initially gave me an impression, the real issue seems to be inclusion of LICENCE and its data conversion to fit it into METADATA.
setuptool
seems to accept only ReST compatible file. (Just impression. No serious test is done yet.)
flint
seems to ignore this license
entry in pyproject.toml
.
According to my vague memory, I tried something along:
license = { file="LICENSE.txt", content-type="text/plane" }
It didn't fix this issue.
I don't know if this is bug on build tool side or not. Important thing is to inform user for practical workaround for the moment and someone who understand situation to ask other packages to work with this situation etc.
I couldn't reproduce. After replacing the contents of LICENSE.txt
with the GPL text, I still get
$ twine check *
Checking sampleproject-3.0.0-py3-none-any.whl: PASSED
Checking sampleproject-3.0.0.tar.gz: PASSED
I cannot reproduce it either... This is what I did in detail:
# docker run --rm -it python:3.12-bookworm /bin/bash
cd /tmp
git clone https://github.com/pypa/sampleproject
cd sampleproject
wget https://www.gnu.org/licenses/gpl-3.0.txt -O LICENSE.txt
sed -i 's/name = "sampleproject"/name = "sampleproject-abc"/g' pyproject.toml
python -m pip install -U pip build twine
python -m build
twine check dist/*
# Checking dist/sampleproject-3.0.0-py3-none-any.whl: PASSED
# Checking dist/sampleproject-3.0.0.tar.gz: PASSED
twine upload --verbose --repository testpypi dist/*
# ...
# INFO Response from https://test.pypi.org/legacy/:
# 200 OK
# ...
# View at:
# https://test.pypi.org/project/sampleproject-abc/3.0.0/
unzip -p dist/sampleproject_abc-3.0.0-py3-none-any.whl sampleproject_abc-3.0.0.dist-info/METADATA > /tmp/METADATA.orig
head -n 20 /tmp/METADATA.orig
# Metadata-Version: 2.1
# Name: sampleproject-abc
# Version: 3.0.0
# Summary: A sample Python project
# Author-email: "A. Random Developer" <author@example.com>
# Maintainer-email: "A. Great Maintainer" <maintainer@example.com>
# License: GNU GENERAL PUBLIC LICENSE
# Version 3, 29 June 2007
#
# Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
# Everyone is permitted to copy and distribute verbatim copies
# of this license document, but changing it is not allowed.
#
# Preamble
#
# The GNU General Public License is a free, copyleft license for
# software and other kinds of works.
#
# The licenses for most software and other practical works are designed
# to take away your freedom to share and change the works. By contrast,
wget https://test-files.pythonhosted.org/packages/c0/c0/7b07efedcdcfab3c2700ce7c23d7a737b4b2419f5917f43993c8635f320e/sampleproject_abc-3.0.0-py3-none-any.whl -O /tmp/sampleproject_abc-3.0.0-py3-none-any.whl
unzip -p /tmp/sampleproject_abc-3.0.0-py3-none-any.whl sampleproject_abc-3.0.0.dist-info/METADATA > /tmp/METADATA.download
diff /tmp/METADATA.*
# => no difference
# Further investigate if the escaping works well
# for the "email message" format used in METADATA:
cat <<EOF > /tmp/test-metadata.py
import sys
import email
with open(sys.argv[1], "rb") as fp:
message = email.message_from_binary_file(fp)
for field, text in (
("license", message["License"]),
("readme", message.get_payload()),
):
print(f"------- {field} --------")
lines = text.splitlines(keepends=True)
print("".join(lines[:3]))
print("...")
print("".join(lines[-3:]))
EOF
python /tmp/test-metadata.py /tmp/METADATA.orig
# ------- license --------
# GNU GENERAL PUBLIC LICENSE
# Version 3, 29 June 2007
#
#
# ...
# Public License instead of this License. But first, please read
# <https://www.gnu.org/licenses/why-not-lgpl.html>.
#
# ------- readme --------
# # A sample Python project
#
# ![Python Logo](https://www.python.org/static/community_logos/python-logo.png "Sample inline image")
#
# ...
# [rst]: http://docutils.sourceforge.net/rst.html
# [md]: https://tools.ietf.org/html/rfc7764#section-3.5 "CommonMark variant"
# [md use]: https://packaging.python.org/specifications/core-metadata/#description-content-type-optional
# ---- Eventhing seems to work well in the METADATA file ----
Anyway, I think this repository is not the best place to be discussing this issue. Once the author has a complete minimal reproducer, the following possibilities might be investigated:
pypa/setuptools
project (later, after the investigation is complete, it might also involve pypa/distutils
and/or pypa/wheel
).pypi/warehouse
project.twine check
and the checks in pypi
, to be discussed in the pypa/twine
and/or pypi/warehouse
projects.However it is important to have a clear idea what item(s) in the list above is actually happening, before opening any new issue; and verify them with solid reproducers.
Please feel free to use the quick script in the example above, to check if the METADATA
file is malformed... It is by no means an exhaustive check though but maybe when added to twine check
is enough?
I should have been more explicit which LICENSE text I was using. I am using good old GPL2 text as https://github.com/osamuaoki/imediff/blob/main/LICENSE and renamed it to LICENSE.txt . This file is causing problem.
With your test case with GPL3 text, I also get no problem on Debian/stable:
$ wget https://www.gnu.org/licenses/gpl-3.0.txt -O LICENSE.txt
$ python3 -m build
$ twine check dist/*
Checking dist/sampleproject-3.0.0-py3-none-any.whl: PASSED
Checking dist/sampleproject-3.0.0.tar.gz: PASSED
So the problem is specific to the GPL2 text. Quotation styles are different.
My GPL2 file uses `foo'
Your GPL3 file uses "foo"
FYI I am using Debian 12/ stable as the base. This means Python3.11. (I also used the latest PIP packages under venv to be current)
Thank you for the reproducer.
This seems to be the same as https://github.com/pypa/setuptools/issues/4033.
I would say the root of the problem is the \x0c
characters in the license text. Isn't it?
Quoting Paul Moore in the thread I linked above:
This is because neither the metadata spec nor the email standards really say what to do with control characters in header values, and the stdlib email package appears to handle it inconsistently.
The escaping goes wrong when pypa/wheel
tries to regenerated the metadata file by ingesting the file that pypa/setuptools
generates. Per se, the file generated in pypa/setuptools
parses correctly, but after being re-generated by pypa/wheel
, it does not.
Recently I submitted a PR to pypa/distutils
that will make the life of pypa/wheel
easier, and prevent this round-trip problem from happening all togheter. However pypa/distutils
has not been merged into pypa/setuptools
yet.
If the problem is indeed \x0c
, I recommend closing this issue since the problem is already being tracked in https://github.com/pypa/setuptools/issues/4033.
As a workaround you can replace all the \x0c
characters in your license file with a \n
, or simply remove out the licence = ...
in your pyproject.toml
as you have previously suggested (the license file will still be included in the wheel as a standalone file, so you are still distributing the license).
I'm confused. The reported error is:
ERROR `long_description` has syntax errors in markup and would not be rendered on PyPI.
line 2: Warning: Block quote ends without a blank line; unexpected unindent.
WARNING `long_description_content_type` missing. defaulting to `text/x-rst`.
That's about the README, not the license, and it's saying that the README isn't valid Restructured text. It looks like whatever changes the OP made to the readme (or to the filename of the readme, because if it were still a .md
file it would be detected as Markdown) are the problem here.
I can confirm the problematic content is in the LICENSE text not in README.
Good old FormFeed aka. FF (0x0c) in LICENSE was the real root cause. By removing them, I get clean result with no ERROR nor WARNING. So the problem is in setuptools.
As for the unhelpful ERROR message which seems to imply the problem reside in README.md or content-type setting... I don't know how exactly it happens to say this but this error report message has some room to be improved to lead people to the root cause.
I'm confused. The reported error is:
Hi @pfmoore, I am trying to summarise my thought process below. Hopefully it will improve the understanding 🤞.
When the stdlib's email
functions are used to read the METADATA
files, they don't care if there is a \xc0
character... It seems to deal with it very well[^1]. So that is why the PKG-INFO
inside tar.gz
sdist parses correctly.
However, when pypa/wheel
regenerates the METADATA
file from PKG-INFO
it replaces the \xc0
character with a \n
. If you happen to have in your license file a \n\xc0
or \xc0\n
sequence, this means that pypa/wheel
will generate a METADATA
file with \n\n
.
Now, in the email message format, the \n\n
sequence is used to mark the end of the headers section and the beginning of the body/payload. In the packaging standard, we use the body/payload of the email message for the README/long_description
.
So the error message is complaining about the README/long_description
, because part of the LICENSE
file is incorrectly being considered as the email message body/payload.
This is the same error discussed in pypa/setuptools#4033, for which pypa/distutils#213 should provide a good workaround (we have to wait until it lands on setuptools).
[^1]: And/or at least the functions used for parsing in twine
/PyPI.
Ah, I see. Thanks - that's a lot more subtle than I realised.
Hi, looking at https://github.com/pypa/setuptools/issues/4033 , problematic control codes are:
'0x0b': ('\x0b', ['', '']),
'0x0c': ('\x0c', ['', '']),
'0x1c': ('\x1c', ['', '']),
'0x1d': ('\x1d', ['', '']),
'0x1e': ('\x1e', ['', '']),
In other words
Oct Dec Hex Char
─────────────────────────────
013 11 0B VT '\v' (vertical tab) Ctrl-K
014 12 0C FF '\f' (form feed) Ctrl-L
034 28 1C FS (file separator) Ctrl-\
035 29 1D GS (group separator) Ctrl-]
036 30 1E RS (record separator) Ctrl-^
These needs to be removed from the original LICENSE file to make source code robust and portable to older platforms.
As for the following text in pyproject.toml:
# This is either text indicating the license for the distribution, or a file
# that contains the license
# https://packaging.python.org/en/latest/specifications/core-metadata/#license
license = {file = "LICENSE.txt"}
I suggest updating this section with:
# This is either text indicating the license for the distribution, or a file
# that contains the license
# For the better compatibility with the older tool chain, please remove control
# characters (0x0b=^K, 0x0c=^L, 0x1c=^\, 0x1d=^], 0x1e=^^) in the` LICENSE.txt`
# file if they exist. See https://github.com/pypa/setuptools/issues/4033 .
# https://packaging.python.org/en/latest/specifications/core-metadata/#license
license = {file = "LICENSE.txt"}
IMHO, we should just fix that bug and not document the workaround, because pip and build use build isolation by default and most people never change that setting, which should fix the problem without people needing to upgrade their toolchain as soon as a new release of wheel is out.
I see your point and agree.
Maybe leave this issue open until the new release of tool chain.
Hi,
Thank you for providing very useful resource.
NOTE: I am not raising issue of the license terms of this
sampleproject
like #17 since this is MIT licensed.ISSUE
Issue is supportive technical information to adopt this example to the packaging of existing GPL programs.
If I simply replace LICENSE.txt content with the LICENSE file content of GPL license, the resulting wheel package failed to upload to testpypi.org.
The local test result of such package goes:
WORKAROUND
In order to avoid this ERROR to successfully transition to the new
pyproject.toml
, I need to do the followings on LICENSE.txt fileand live with WARNING or use the last option of commenting out
license = { file = "LICENSE.txt"}
inpyproject.toml
to stop including LICENSE.txt into METADATA file in the wheel package.I am not sure what is the best solution here.
Desired situation
At least some pointer to the best solution for resolving this situation needs to be easily available to the prospectus users.
I think the build tool should be smart enough to wrap/escape problematic character sequence in the LICENSE.txt to avoid causing issues. But that is not the case now. Requiring to edit LICENSE.txt manually to avoid ERROR seems to be bad idea.
Dropping
license = "LICENSE.txt"
inpyproject.toml
seems to be most practical solution for now. You still get to include unmodifiedLICENSE
into wheel package.Maybe, it may be a good idea to add link to the root cause bug until it is fixed.
Additional note
As I checked https://github.com/pypa/build , its
pyproject.toml
haslicense.file = "LICENSE"
but the resulting METADATA doesn't contain it. Thisbuild
is built byflint-core
while thesampleproject
usesrequires = ["setuptools>=43.0.0", "wheel"]
as the build backend.FYI: I post my packaging experience around this at: https://osamuaoki.github.io/en/2024/02/03/python-pypi/