Run auditwheel on new manylinux uploads, reject if it fails

dstufft commented 5 years ago

What's the problem this feature will solve?

Projects are purposely uploading invalid manylinux1 wheels to PyPI which are causing crashes and other nonsense when end users erroneously use them.

Describe the solution you'd like

PyPI should be as strict as possible about the wheels it allows to be uploaded, and thus should do something like run auditwheel when the upload is a manylinux wheel, and reject it if it's not compliant.

Additional context

See https://github.com/tensorflow/tensorflow/issues/8802 as an example. We will likely need to figure out if auditwheel is safe to run or if we need to do something like farm this out to a worker process that does it in isolation (and if so, we might need to adjust the upload API to allow it to be async).

njsmith commented 5 years ago

CC @ehashman

ehashman commented 5 years ago

also cc @di given the tensorflow issue.

njsmith commented 5 years ago

Relevant information in case anyone's worried this is too harsh or something: In response to a twitter thread calling out PyTorch for breaking the manylinux spec, Soumith Chintala (PyTorch's creator) just tweeted:

I'd be 100% supportive of you rejecting non-compliant wheels as well. It applies a uniform standard to everyone, and we'd move the needle much faster with vendors than we are right now.

The motivation here isn't to punish particular companies for being naughty, but to (a) improve the experience for PyPI's users, (b) give the developers at those companies the leverage they need to get things fixed.

di commented 5 years ago

Yeah, probably overdue for an update, but I'm currently working with the TensorFlow team and other interested parties to try and determine what can be done here to improve the situation.

I think it's probably worth saying here (for anyone following along) that while I do think PyPI should eventually start auditing uploaded wheels, this is not going to happen overnight and/or without fair warning.

I'm interested in ensuring that PyPI is a source of artifacts that are reliable for users, but I'm also interested in putting in the work necessary to ensure that we have specifications to support the increasingly common use cases that currently lead to this issue.

I'll post a discussion thread soon to add more detail and discuss in-depth -- let's keep this thread on-topic.

njsmith commented 5 years ago

We will likely need to figure out if auditwheel is safe to run or if we need to do something like farm this out to a worker process that does it in isolation (and if so, we might need to adjust the upload API to allow it to be async).

I believe the situation is: when auditwheel checks a wheel, that involves unpacking the zip, reading the wheel metadata, and for any binaries inside, using pyelftools to read metadata out of the binaries. It doesn't execute any code from the binaries; it just reads metadata from the ELF headers. So in theory this isn't any more dangerous than e.g. trying to decompress a potentially-malicious zip file. And pyelftools is pure-python (or at least its README says it is :-)), so at least in theory we should be safe from really egregious issues like buffer overflows. I doubt anyone ever thought much about using it on untrusted binaries, so I'd still be wary of less-dangerous issues like carefully crafted binaries causing it to allocate a bunch of memory or spin on CPU. It might make sense to run it in a subprocess with some rlimits set. And it'd be good for someone to doublecheck everything I just wrote :-). But at least in principle, there's nothing especially dangerous about auditing manylinux wheels.

auditwheel repair is a different matter – it uses patchelf, which is a complicated C++ program that I'm certain has never had any security audits. But PyPI won't be trying to repair wheels, just check them.

dstufft commented 5 years ago

I'm personally willing to take a fairly hard line stance on this. The presumption when manylinux1 was added was that projects would be good citizens and upload properly produced wheels or would not upload manylinux1 wheels at all. Of course mistakes happen and people can ship broken wheels, but that should then be treated as a bug and either fixed or faulty wheels should stop being produced until they can be produced correctly.

Obviously this won't happen overnight, for one the feature has to be implemented at all, and I wouldn't see us treating it any differently than we did the HIBP rollout. Metrics to gauge impact, set a date, possibly generate an email, shut it off on that date. We can use metrics and outside information to inform that date, but to be clear, I'm perfectly OK with leaving projects who don't fix themselves out in the cold.

IOW, yes we should do this, no we shouldn't make it a surprise, yes we should use data to inform how aggressive we are, no we shouldn't let any one project dictate our strategy here.

di commented 5 years ago

FYI, I've started a discussion on the next manylinux spec here: https://discuss.python.org/t/the-next-manylinux-specification/

di commented 5 years ago

At the PyCon 2019 packaging summit, we determined that this is blocked on #726.

brainwane commented 5 years ago

Once we have the ability to do this, should we also consider running auditwheel retrospectively on existing release artifacts? Thinking about https://github.com/pypa/manylinux/issues/305 (see https://github.com/pypa/manylinux/issues/305#issuecomment-489332594 ).

takluyver commented 5 years ago

I don't think it makes sense to retrospectively invalidate wheels that are incompatible with new distributions, as in that case. We presumably want to maintain the rule that you can't replace an already uploaded package, so the only option would be to remove them or somehow mark them as bad so installers don't consider them. But those wheels are still useful for users on other popular distros.

It may be worth identifying affected wheels and alerting the maintainers of those packages to think about uploading a new version. But that's probably quite a separate feature from gating uploads.

jamadden commented 5 years ago

We've also seen wheels being uploaded with generic platform tags like py3-none-any but containing and requiring (non-manylinuxX compatible) shared libraries (e.g., #6400, #6403). Would it be worth extending the check to such wheels? Or simply denying uploading generic wheels that in fact contain platform-specific binaries?

takluyver commented 5 years ago

It's possible to imagine packages containing platform-specific binaries which are nonetheless cross-platform - e.g. the binaries are only used on the relevant platform, and it works without them on other platforms, or it's a library for introspecting such binary files, and includes them as test cases.

jamadden commented 5 years ago

That's true. So I suppose the interesting case is packages (distributions) that ship binary Python modules with the intent of importing them, or binary shared libraries with the intent to use ctypes or similar on them. Without a pure-Python fallback, the package doesn't work, on only works partly.

I only bring it up because it's a potential annoyance for end-users, and a small burden on PyPI infrastructure and admins (all the cases of this I've run into I've only noticed because the distributions requested upload limit increases, meaning they need extra storage, bandwidth, and admin time).

I don't know if it's worth trying to do anything about that or not (that's why I asked 😄) False positives might be too high?

Most of the times so far the mislabeling has been by accident and the packagers have worked to update the build system to produce manylinux compliant wheels, so I'm guessing they would have welcomed an error telling them that their distribution didn't work like they wanted it to. I haven't yet heard from a packager who was deliberately trying to install one-platform modules on all platforms and sidestep the manylinux requirements to "sneak" binaries onto PyPI.

takluyver commented 5 years ago

I'd be OK with putting in restrictions like that if there was a way for projects to get them lifted (like the size restriction). I don't think either of the cases I mentioned are very common.

But do we have evidence how common it is that people are uploading platform specific wheels with -none-any tags? The two issues you linked before are from the same author, so it could be a one off.

jamadden commented 5 years ago

I've only seen a few issues, those that resulted in limit requests. I don't have evidence, just anecdotes.

One other notable one was https://github.com/pypa/warehouse/issues/6197, which involved distributing an actual executable, the running of which was the main point of the Python code. It linked to a very specific set of system libraries and was uploaded as py3-none-any. The executable was also included in the source .tar.gz. We worked through those issues and resulted in a buildable source archive (theoretically cross-platform), but hadn't yet gotten to produce a manylinux wheel; it was at least producing linux-x86_64 though.

I'm not pointing these concerns out to the packagers to be pedantic or malicious but to try to help them do what it looks like they're trying to do and increase the overall quality of the ecosystem. That would be the reason behind the warning or error too: a clear message about a problem with some info about where to go for help instead of being contacted by end users and not clear on what's wrong. I don't know whether automating that is truly helpful, or indeed if I should stop checking for problems like that myself.

brainwane commented 4 years ago

(I think this ties in to https://github.com/pypa/packaging-problems/issues/25 .)

@di said today in IRC that, in his opinion, this does not need to block on #726.

What actual hardware or other services would we need in order to implement this just on new uploads? Just the auditwheel check?

dstufft commented 4 years ago

I think this is just a SMOP, AFAIK auditwheel just does static analysis, so we just need to actually wire things up.

brainwane commented 4 years ago

Implementing this would -- per @pradyunsg's opinion in IRC just now -- help mitigate the compatibility issues that he sees users face. And in my assessment, it would help reduce how often users get into bad situations, and thus reduce the general support load on the people who support Python packaging users. Since rolling out the new pip resolver will probably lead to a burst of support requests, I'd love more progress on this feature.

Per discussion in IRC today, we may need to implement #7730 first. More discussion is welcome.

pypi / warehouse

Run auditwheel on new manylinux uploads, reject if it fails #5420