Open mxmlnkn opened 3 days ago
Therefore, my first instinct would to make pypa/build work around this issue somehow by producing tarballs that do not return true for
zipfile.is_zipfile
, but it seems that they do not want to do this.
To clarify, build does not produce tarballs at all. You can think of build as something of an orchestrator - it calls out to the build system in your pyproject.toml
using a well-defined interface. The build system generates the sdist (setuptools in this case).
Therefore, my first instinct would to make pypa/build work around this issue somehow by producing tarballs that do not return true for
zipfile.is_zipfile
, but it seems that they do not want to do this.To clarify, build does not produce tarballs at all. You can think of build as something of an orchestrator - it calls out to the build system in your
pyproject.toml
using a well-defined interface. The build system generates the sdist (setuptools in this case).
Thanks for the clarification. Unfortunately, I was not fully aware of this because Python build systems feel fractured and confusing to me. I guess I could also try to ask at pypa/setuptools about this then...
Sdists have been standardised as tarballs and no modern builder supports producing zip sdists. I don't know if PyPI still allows the upload of zip sdists or if Twine would like to drop support for them in check
. Perhaps if is_tarfile
is more reliable, the order of operations could be reversed in pkginfo
.
I'm not sure how easy it'd be for setuptools to guarantee that tarballs aren't identifiable as zipfiles, and unless setuptools is doing something super exotic, all builders are probably susceptible to producing misidentifiable tarballs.
Potentially not any longer but when Twine started it allowed a source distribution (assistsdist) and a source archive which were two very different kinds of artifacts. They were effectively mutually exclusive at some point. The zip archive handling is a result of that behavior. I haven't checked to see if the source archive is still supported but it's okay to drop that support if pypi no longer allows it
:wave: @di, would you know if PyPI still allows zip archives?
:wave: @di, would you know if PyPI still allows zip archives?
PyPI is one repository that Twine attempts to support. Others may still require this using repositories other than PyPI that attempt to behave like PyPI. We have no way of knowing if those folks are also using twine check
or expecting it to check the archive metadata.
My recollection is that even 6 or so years ago (I don't remember when I started working on twine) source archives were very rare but still used.
I would hope some PyPI metrics could help close the door here but again it's likely a change needing a major version to let others know that support is being dropped.
Is there an existing issue for this?
What keywords did you use to search existing issues?
.tar.gz zip tarball is_zipfile
What operating system are you using?
Linux
If you selected 'Other', describe your Operating System here
No response
What version of Python are you running?
How did you install twine? Did you use your operating system's package manager or pip or something else?
What version of twine do you have installed (include the complete output)
Which package repository are you using?
pypi.org
Please describe the issue that you are experiencing
twine check fails for a tarball generated with
build
. This happens randomly.Please list the steps required to reproduce this behaviour
Rune twine check on the attached broken-tarball.gz.
broken-tarball.tar.gz
The Python version does not seem to matter. Platform also does not seem to matter. I have observed these errors in my CI on MacOS, Ubuntu, Python 3.8, Python 3.9, Python 3.12, Python 3.13.
If you really want it fully reproducible, see my CI setup, e.g., in this failing commit.
Anything else you'd like to mention?
I have found out that the underlying problem is with CPython's
zipfile.is_zipfile
being to lax and my cosmic bad luck to suddenly end up with more or less reproducible tarballs that makezipfile.is_zipfile
return True even though it is not a valid ZIP file.However, I doubt that this gets "fixed" in CPython therefore it would be nice to see this being worked around in twine.
Currently, I have this ugly hack to work around it in my CI:
One possible solution would be to make twine smarter about ZIP detection, e.g., it could use try-except while trying to read one member form the ZIP, which should trigger the errors I am seeing in my CI. Or, it could also use the file suffix for arbitration in case that both "is tarball" and "is zip file" return true.
Personally, I still see this as a structural problem in CPython or even in the ZIP format itself, which probably does not get fixed any time soon:
This makes me fear that there might be other tooling that might fail with similar errors, e.g., during the
pip install
process and therefore it might not be a good idea to upload such tarballs to PyPI (if it even accepts such a tarball. maybe it gets rejected because it also runs twine check on it). Therefore, my first instinct would to make pypa/build work around this issue somehow by producing tarballs that do not return true forzipfile.is_zipfile
, but it seems that they do not want to do this.It might also be a satisfiable solution to simple print a better error message with suggestions how to work around this issue. That would have saved me 3+ hours of debugging my CI.