pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.6k stars 965 forks source link

Allow blocking uploads based on uploader version or what generated the artifact #8285

Open dstufft opened 4 years ago

dstufft commented 4 years ago

It's not a a lot of people, but it seems like there is a fairly regular stream of people who try to upload something to PyPI that takes advantage of some new metadata feature, have that upload successful, and then PyPI appears to be completely ignoring their use of this new feature.

Typically these folks tend to believe there is a bug in PyPI, and then file issues reporting said bug.

Ideally there would be some mechanism that would allow these users to determine the cause of this problem themselves, rather than requiring a PyPI administrator to debug each and every case of them.

The biggest hammer we have to do this, is to allow us to maintain a list of uploader versions and wheel/sdist generators that we would reject uploads from. Thus whenever we find some particular version is becoming problematic, we can simply block any future uploads, and present an informative error message to the user.

Another option would be to allow creating some sort of warning system, maybe allow PyPI administrators to add specify a warning that should occur when a file was uploaded by a certain version of an uploader, or generated by a certain version of a tool. This warning could then be displayed on the project's management pages (or even on the public page, but only for logged in users who have maintainer on that package using our client side includes), this would at least provide some prodding for users to self help themselves when they're investigating why something isn't working as they expect, rather than opening up issues on the Warehouse tracker.

If we went down the path of a warning system, we'd presumably want to make that warning date aware, so that we don't suddenly start throwing a warning on every file on PyPI ever.

Unfortunately our ability to go granular is limited here, because the problems typically are that some piece of metadata isn't making it into the final metadata because some piece in the pipeline before us didn't understand it, and without a way to introspect that, we're left with pretty broad solutions that affect all uses of a particular version of the tooling, rather than just the cases where someone tried to use that new feature.

di commented 4 years ago

Another thing we could do is just block on old metadata versions. Then we wouldn't need to keep track of what uploader supports what metadata version, or handle the case where we don't actually know the uploader.

One of the problems with this, though, is that IIRC some tools have a tendency to pick the minimum metadata version necessary given the requested metadata, instead of the latest version. I'm not sure if this behavior still exists.

For posterity, using the new metadata dataset, here's counts by metadata version over the last month:

SELECT
  metadata_version,
  COUNT(*) AS count
FROM
  `the-psf.pypi.distribution_metadata`
WHERE
  DATE(upload_time) >= DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)
GROUP BY
  metadata_version
ORDER BY
  count DESC;
metadata_version count
2.1 116903  
1.0 10435  
1.1 7138  
null 5848  
1.2 5094  
2.0 1041