Open bhrutledge opened 2 years ago
@sigmavirus24 What do you think of this initial writeup? Any edits before I publicize it (e.g. to folks who submitted check
feature requests)?
@sigmavirus24 Just noticed the 👍 (can't decide if the absence of notifications for those are good or bad). Is there anything you'd like to add, e.g. lessons learned from flake8?
Looks reasonable. Although, I don't know if the checkers would need custom args (if they did, we'd also be talking about a config file, I suppose).
@webknjaz Can you give an example of a checker that would need custom args? And/or can you propose how some of twine check
features would be implemented in a checker?
Not really, I was just commenting on the original mentions of having to implement arg parsing which doesn't seem necessary right now.
cc @di @ssbarnea @gaborbernat @henryiii
Y'all have shown some interest in improving twine check
. Do you have any thoughts on this proposal?
I am not very keen on having plugins for these checks as it would mean that we would need to modify each project running 'twine check --strict' in order to enable to newly added checks. I do not want to have to add them to benefit from newer checks. If we can avoid that downside the better. If we add new check only to strict, we should be fine.
Regarding where these checks are implemented, i do not care much, but if they are in another project, it should be a direct dependency of twine, not an optional one.
I think the proposal is functionally sound, but I'm not sure I see the demand for third-party plugins for twine check
. I think the main goal of the command is determining "will PyPI accept/reject this distribution & why" and to that end, we probably want to be authoritative about the results and not leave that for others to implement.
Assuming pypa/packaging#147 eventually happens (I see no objections there, just and lack of people to work on it) I'm having a hard time seeing what else users would want twine check
to do, and thus a justification for this feature.
I think I'll likely be a little repetitive of @di and @ssbarnea if I write a full response. I don't think there are an unlimited number of checks for metadata that need to be done. There are a set of things that are needed (either for PyPI, or for consistency), but there isn't a need for an arbitrary number of things, like flake8. And, honestly, I think pre-commit has been a better solution than flake8 - instead of providing an extension system for running checks, it's an infrastructure for running anything, so there are lots of great (and modifying) checks that are not in flake8. I would see the same thing here - if you really want to add a custom check, you can write a custom checker and then run it on your wheel files, adding it to pre-commit or CI or wherever. Extending something based on what is installed in the current environment has caused issues for pytest (flaky is registered by multiple extensions, which causes issues for pypa/build).
I'm very much in favor of twine check becoming more powerful, but I don't think it needs to be come externally extensible. A good design for internal use is nice, but it doesn't seem to be that helpful to be able to add twine-<stuff>
extensions to add checks. If there's really a need for a package-metadata version of flake8, it doesn't have to be part of twine, IMO.
For completeness, here's a few things I can think of that would be nice to check:
For "extendable" checks, the only think I would think of would be validating md or rst markup, but that can be done in the original file with a normal checker, usually.
So I proposed #843 and tried to introduce the plugin system because I felt like it could be useful. To be honest, I'm fine with dropping the "plugin" architecture and having all checkers live in twine.
What are some examples of custom checks that people would like to write? Can we/they share example implementations to drive the API design?
How should/must the new API be documented before it's implemented?
I think it should be kept simple:
def checker(filename: str) -> Tuple[List[str], List[str]]:
The checker would take the filename of a tarball or wheel and return a list of warnings/errors.
Lets put all checks in twine and look into moving them only if that becomes a problem, avoiding over-engineering. At some point we may want to add a noqa mode for bypassing some of them deliberately, but again, later. The more checks we add to twine strict the better.
I disagree these should all just get thrown into twine. Specifically, everything @CyrilRoelandteNovance listed is non-essential for uploading to PyPI. Those are things other organizations what, which is exactly the reason for making twine check
extensible. Many companies publish things and want an easy way to lint things. Other languages provide this, Python makes it manual unless that company writes their own specific tooling.
It makes no sense to build checkers for every optional thing some mega-corp might want. I also don't see how we'd build a # noqa
given that twine
looks at the generated metadata so this concept of "We can figure out a way to ignore things when the tool turns to crap" is moot.
Finally, @CyrilRoelandteNovance's idea of making every plugin figure out how to unpack a distribution, etc seems incredibly naive.
If there's really a need for a package-metadata version of flake8, it doesn't have to be part of twine, IMO.
This makes total sense to me
Missaligned Python-Requires and wheel tags (py2 + 3+ for Python requires) (twine check only)
There are other tools that do this already namely https://github.com/asottile/setup-cfg-fmt
@bhrutledge I think we can close this
I think we are talking about twine check
, not twine upload
. I think it's okay if check does a little more that just the upload, as long as it's clearly things that all packages should do. But I'd understand if not.
It would be nice of wheel had a api, it would make implementing this elsewhere easier.
Setup-cfg-fmt only works for one (of soon to be three) ways to configure setuptools, not for any PEP 517 build system, and it absolutely does not check this.
it absolutely does not check this.
It auto-fixes requires_python
and the classifiers. So it's not going to warn you, but why do you want to have to manually fix things when the computer will do that for you?
I think I missed this distinction:
non-essential for uploading to PyPI.
It seems like there's some disagreement about whether twine check
should only check essentials, vs. check "best practices" that are subject to opinion. I tend to agree with @sigmavirus24 that "best practice" checks should not be implemented in this repo. However, it seems like twine check
could be a convenient hook for opinionated folks to integrate such checks into their CI, and that implementing the architecture to support that will make it easier to add essential checks to Twine.
That said, it's not clear to me what is essential for uploading a package to PyPI, and does (or should) that include being able to pip install
the package? Could someone (maybe @di) share a list? Similarly, of all the outstanding requests to improve twine check
, what of those would be considered essential?
IMO, twine check
should check whether an upload to PyPI will succeed or fail -- that's about it.
That said, it's not clear to me what is essential for uploading a package to PyPI, and does (or should) that include being able to pip install the package? Could someone (maybe @di) share a list?
Essentially it needs to pass a lot of metadata validation and validation of things like the filename/extension, etc, which lives at https://github.com/pypa/warehouse/blob/bf389e4acd129697b28900c962dc9d363da3b00f/warehouse/forklift/legacy.py
Similarly, of all the outstanding requests to improve twine check, what of those would be considered essential?
I think that's probably https://github.com/pypa/packaging/issues/147 and #430 (which might itself be fixed by https://github.com/pypa/packaging/issues/147). Everything else seems to be a 'best practice' that PyPI doesn't actually enforce, something that hasn't been fully agreed upon as a standard, or some kind of meta-improvement.
@di Your statement is in contradiction with --strict
mode. Basically if twine would only check if an upload would succeed it would not need a strict mode. The idea of strict mode was that it would find problems that pypi is not able to catch or prevent.
So that we're on the same page, here's the current behavior of twine check
.
Without --strict
:
long_description_content_type
is missinglong_description
is missinglong_description
has syntax errors in markup and would not be rendered on PyPIWith --strict
: warnings are treated as errors.
It sounds like most people on this discussion (all of whom aren't Twine maintainers 😉) are suggesting that "best practice" checks be added as warnings, similar to a missing long_description
. I'm open to that in theory, although I'm guessing that not all projects that use --strict
in CI would agree on what the set of warnings should be, which would result in folks opening Twine issues asking to revert a "best practice" check, or make it optional. Such checks could be made optional through configuration, but I'm a little reluctant to add that complexity.
I agree with @bhrutledge that adding extra configuration complexity should be avoided, if possible. I wonder if we could make use of python warnings
module for these warnings and collected and display them, effectively action on them.
I mention this because if we use these warnings, it means that power-users could also make use of PYTHONWARNINGS
to override default behavior, basically giving them a bypass configuration mechanism without us having to implement support for it.
The current behavior is to write warnings to sys.stdout
. There's related discussion in https://github.com/pypa/twine/issues/818 about using logger.warning
, instead of warnings.warn
, to normalize Twine's output re: verbosity.
@di Your statement is in contradiction with
--strict
mode. Basically if twine would only check if an upload would succeed it would not need a strict mode. The idea of strict mode was that it would find problems that pypi is not able to catch or prevent.
That may have been the initial idea, but the initial ideas of many things sometimes don't have sufficient experience to be correct. Look at the discussion of universal wheels. They were always intended originally to be python version independent and now have transmogrified to being python 2 and Python 3 compatible because of a choice made by a popular implementation.
So that we're on the same page, here's the current behavior of
twine check
.Without
--strict
:
- Warn if
long_description_content_type
is missing- Warn if
long_description
is missing- Error if
long_description
has syntax errors in markup and would not be rendered on PyPIWith
--strict
: warnings are treated as errors.It sounds like most people on this discussion (all of whom aren't Twine maintainers 😉) are suggesting that "best practice" checks be added as warnings, similar to a missing
long_description
. I'm open to that in theory, although I'm guessing that not all projects that use--strict
in CI would agree on what the set of warnings should be, which would result in folks opening Twine issues asking to revert a "best practice" check, or make it optional. Such checks could be made optional through configuration, but I'm a little reluctant to add that complexity.
I am forcefully against "best practice" checks especially when there's no source of truth just disparate groups of the opinion that the things they want are best practices. The only sane option in this case is a huge amount of contributed checks we few would have to maintain and eventually probably make configurable or pluggable architecture to let people install their best practices plugins as they see fit. The latter is seen as wholly unappealing at this point and the former us appealing to those who won't be doing the work and repulsive to me.
I think it's best to close this as "twine will ensure the minimum necessary medatata has been generated and included in an artifact for use with PyPI and only PyPI". Existing feature requests that don't fit into that should be closed.
Breaking down @sigmavirus24's observations.
The only sane options:
- A huge amount of contributed checks we few would have to maintain and eventually probably make configurable. Appealing to those who won't be doing the work and repulsive to me.
- A pluggable architecture to let people install their best practices plugins as they see fit. Seen as wholly unappealing at this point.
@sigmavirus24 Re: option 2, do you see that as unappealing? I still find it appealing, for its potential utility, a cleaner factoring of twine.commands.check
, and because it sounds like a fun project.
I don't find option 1 repulsive, but I am a -1.
I think it's best to close this as "twine will ensure the minimum necessary metadata has been generated and included in an artifact for use with PyPI and only PyPI". Existing feature requests that don't fit into that should be closed.
Before doing that, I'd like to dig into @di's thoughts in https://github.com/pypa/twine/issues/848#issuecomment-1013962096, and get a sense of what it would take to fulfill the promise of ensuring the minimum necessary metadata.
Before doing that, I'd like to dig into @di's thoughts on #848 (comment), and get a sense of what it would take to fulfill the promise of ensuring the minimum necessary metadata.
I think that should be a separate discussion. This one has too many competing threads of discussion.
Sorry I took a while, intended to post this earlier, and I think it's been indirectly mentioned:
As I see it, there are three levels of checks:
twine check
. (Error in markup on PyPI that would be accepted is also bundled here currently)--strict
.Category 1 was mentioned by @di - and that's currently still lacking, and should be the highest priority. That's clearly a job for twine check
.
Category 2 already does exist, but only a few checks on long-description are implemented, and of those, missing fields is actually one I'd probably categorize in category 3 if I were starting from scratch. This has limited enough scope I think it could be "warnings" in twine check
.
I think we all agree category 1 should be part of twine, and category 3 can be implemented elsewhere. I think the best solution for allowing others to write checks is not to have a plugin system for twine, but to have some sort of API that others can use - maybe starting with wheel eventually, or even if twine provided an import twine
that could be used to make checks. Someone else could setup plugins, or make their own checks easily. Plugin systems are fragile - look at pytest's flaky
for example, or the recent IPython 8 usage of "black" as a plugin. I'm not saying they are bad, but would take work and make installing twine as an "app" harder (I always use pipx run twine
, the GitHub action builds it in, etc).
I think clear invalid data should be warning - this is very much category 2. For example, if requires-python=">=3.7"
is specified and py2 is in the tags, this is incorrect and confusing; a warning should be produced. Also, I hadn't thought about broken classifiers, but that could be a warning too - if a python :: 3.6
classifier was given this would also be a miss-match. I do not think missing classifiers would at all count in category 2 - that's category 3, and a stylistic choice - classifiers are optional. But any clear inconsistency == bad metadata should be a warning, IMO.
If we call missing metadata a "warning", I think we could have a minimum set of minimum metadata (long-description, requires-python, etc) that "should" be included, and those could be warnings (errors on --strict
). This is based on the existing warnings. As I said, I'd really consider this category 3 if it wasn't for the fact it's already there for long-description.
I'd like to see some sort of checker that does category 3 - I'd like something that makes sure I don't forget various classifiers, some types of URLs, etc. But I think just providing a bit of API to help others provide this is likely enough. A tool for this would need to support configuration to tell it what to allow / skip, etc. Similar to check-manifest
, for example.
Look at the discussion of universal wheels. They were always intended originally to be python version independent and now have transmogrified to being python 2 and Python 3 compatible because of a choice made by a popular implementation.
I need to briefly reply to this. Python version independent wheels predate universal
. The original method to support Python 2 and 3 was to pre-process the Python 3 code with 2to3 - so this was the default assumption by bdist_wheel. universal
was developed as a command directly to bdist_wheel
to tell it to produce Python 2 and Python 3 tags for a wheel in wheel 0.10 (2012). At the time, there was no way to tell a Python version independent package what range of Pythons it supported; the assumption was it could not span 2-3, and always matched the major version. universal
has no standardized form in any PEP and has no meaning outside of setuptools + wheel. There literally is only one implementation because it's not a standard - it is the implementation. Here's the original implementation note: "universal=1
will apply the py2.py3-none-any
tag for pure python wheels." PEP 427 does not even mention "universal".
What was developed as a standard was Requires-Python
, this doesn't require custom commands to bdist_wheel
. Wheel has never been updated to take this field into account, and since there's no need to do this except to support Python 2, I don't see it happening - pure Python and extensions are handled correctly in most cases, and the cases they are not are also not fixed by a single setting (for example, if a pure Python library collects unicode escapes per Python version as data files).
The proposal in a different issue was to simply check the Python tags, and produce a warning if a Python 3 only package (via Requires-Python) has a py2
tag; this is clearly wrong and confusing. It doesn't matter how it was produced or how it is fixed, it's simply inconsistent metadata which is a good thing to check, IMO.
@di Your statement is in contradiction with --strict mode. Basically if twine would only check if an upload would succeed it would not need a strict mode. The idea of strict mode was that it would find problems that pypi is not able to catch or prevent.
Well, that's probably because I can't say I really agree with the addition of the --strict
flag in #715. The initial idea in adding these warnings in #412 was to warn the user that running twine check
without these fields was meaningless, not to imply that adding them was a "best practice" (even if some folks might consider it to be).
Before doing that, I'd like to dig into @di's thoughts in #848 (comment), and get a sense of what it would take to fulfill the promise of ensuring the minimum necessary metadata.
This is covered in detail in https://github.com/pypa/packaging/issues/147. The TL;DR is: move PyPI's metadata validation out of PyPI and into packaging
, where it can be reused by both PyPI and twine
.
Well, that's probably because I can't say I really agree with the addition of the
--strict
Yeah, I think a better name would have been akin to clang
or sphinx
where you have --error-on-warning
instead.
Caveat: I have very little experience with proposing and implementing substantial new features/APIs in open-source projects.
There have been more than a few requests to add functionality to
twine check
, but those have been mostly flagged as duplicates of #430, which is currently blocked on https://github.com/pypa/packaging/issues/147, which hasn't seen much movement.843 contains a proposal to add custom "checkers" via setuptools
entry_points
, which would allow functionality to be added by independently-maintained packages. Twine already allows for plugin sub-commands viaentry_points
, but those would have to implement argument parsing and reading packages from disk, and be run as a separate command fromtwine check
.The purpose of this issue is to gather requirements and propose an API for custom checkers. For an initial (but insufficient) overview of the current API, see the autodoc for
twine.commands.check
.Some initial ideas/questions:
long_description
, implemented in a private method. That should eventually be refactored to match the new API, but it could be refactored earlier to help establish the new API.PackageFile
, along with the original filename.check()
andmain()
intwine.commands.check
.