pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.54k stars 952 forks source link

Handle security implications of PEP 561 type hinting packages #4164

Open ncoghlan opened 6 years ago

ncoghlan commented 6 years ago

The mypy devs are working on a feature that allows automatic discovery and downloading of project interface stubs by appending a suffix to existing projects (in the form of <name>-stubs): https://www.python.org/dev/peps/pep-0561/#stub-only-packages

(see https://lwn.net/SubscriberLink/757218/a7b754a41ad74a49/ for more background)

A version of this is already implemented in mypy, which means that if a PyPI project doesn't have a stubs package, and doesn't advertise itself as providing type hints, mypy may go and download the related stubs package.

This suggests that there may need to be some special rules put in place for stub interface packages on the PyPI side of things, such as not allowing stub packages to be published without the approval of the base package maintainers (or at least without alerting them to the fact that the stub package exists).

ethanhs commented 6 years ago

I should clarify none of this happens automatically. It is my view that it should be up to the user to manage these packages. Mypy simply searches for existing installed packages, and will suggest the user look for <name>-stubs in the near future.

However, I do not think this reduces the importance of reserving the <name>-stubs names. One suggestion by @ilevkivskyi is to simply register the stub package name for those who register or have registered the package name. This way the owner of e.g. numpy can own numpy-stubs by default, and if they so choose, give it to someone else.

dstufft commented 6 years ago

I'm not a big fan of the <name>-stubs convention, for two reasons:

  1. Generally plugins of this nature have the "type" first, and the unique name second. So something like stubs-<name>, or even better, stubs.<name> I think fits in more sanely.
  2. I think stubs to be way too generic of a name to automatically reserve, because "stub" as a concept isn't even remotely close to unique for typing or mypy and so there are a lot of collisions that already exist on PyPI.

With regards to who would have permission, why would say numpy use numpy-stubs instead of just doing them inline?

ethanhs commented 6 years ago

For 1, I think it is more apt to say the distribution they are an extension for goes first. In many ways stub packages are sub distributions, so having the name as a suffix makes a lot more sense to me. For example, we have pytest-{xdist, flake8, mypy, etc}. Furthermore the name is closer to the way one talks about them. "Where can I find numpy stubs?" "oh, numpy-stubs". The closeness to the way people talk about it is important I think.

As for 2, I actually went through all the package names on pypi and found none ending with "[_|-] stubs" at the time. I'd be interested in concrete examples of conflicting uses in package names.

For your third point, it takes time to develop stubs for large packages, therefore they likely want to have a faster release cadence to share the latest additions to the stubs, furthermore, some distributions likely will want stubs separate as they are not yet keen on typing.

gvanrossum commented 6 years ago

why would say numpy use numpy-stubs instead of just doing them inline

The reason for not just using inline annotations is that (a) for C code you need stubs anyways, and (b) even for Python code, stubs are (1) more efficient to parse for mypy, and (2) don't require all the code to type check perfectly. (These are all the same reasons why we have typeshed.)

The reason for separate distribution packages is that a different team might be working on the numpy stubs and (at least while they're striving for completeness) they may want to release on a separate schedule than the main numpy releases, and use a separate sequence of version numbers. E.g. for numpy 1.14 there may be numpy-stubs version 1, 2, 3, and there may have to be separate versions of numpy-stubs for numpy 1.13. It's up to the team creating numpy-stubs how to combine these version numbers, e.g. v1.14-1, v1.15-2, etc.

gvanrossum commented 6 years ago

(Looks like Ethan and I responded in parallel -- we didn't read each other's responses.)

dstufft commented 6 years ago

Sorry, I've been a bit busy and am a little behind in catching up on these issues.

As far a list of projects, searching for "stubs", I've found:

I've included cases where -stub is in the name as well, because I don't think human beings are generally great at remembering the difference between plural vs not.

I'm not super comfortable with just universally assigning a generic term like "stubs" to MyPy or Typing on PyPI and I would greatly prefer it if it used something more specific to that problem domain. More to the point though, while maybe that specific pattern isn't in wide use, I think the larger concern about saying that some generic term "belongs" to one problem domain is going to lead to confusion for people, and folks coming to the PyPI developers about it.

In a related set of features, I've been working on the ability to reserve a namespace in PyPI, similar to NuGet's ID Prefix Reservation and the plan there has always been to limit that particular feature to reserving particular prefixes, not suffixes, so I'm also concerned how a feature like this, that wants to reserve suffixes, would interplay with that.

As an aside:

A version of this is already implemented in mypy, which means that if a PyPI project doesn't have a stubs package, and doesn't advertise itself as providing type hints, mypy may go and download the related stubs package.

This sounds like a really bad idea unless there's a hard limit in PyPI that the owner of said stubs package is also the owner (and probably we'd want to amend PEP 503 or something to suggest that all repositories do that). Otherwise this feature is a pretty big security issues, and gives the ability for anyone who registers the stub package the ability to execute arbitrary Python on people's computers for packages that aren't theirs. If automatic installation is the overall intent here, then this feature needs designed really carefully.

gvanrossum commented 6 years ago

Thanks for the research! It looks like uwsgi-stub has almost the right idea -- all it needs is PEP 484 style type annotations (IDEs also read those these days -- at least PyCharm does). A few others seem to refer to RPC stubs, which is a reasonable use.

mypy may go and download the related stubs package

I don't know where Nick got that idea, but that's not how PEP 561 works. I recommend that you read it before getting too upset about it. Ethan also already explained this above.

There's still a risk of domain squatting of course, and users often download things without verifying them, but there's no code in mypy that ever goes and download anything, and I don't see any reason why we would ever do so.

Another twist is that mypy doesn't care about the distribution name (the name listed on PyPI). It cares about the name of the installed package (the directory that goes into e.g. site-packages), and that is what should be named foobar-stubs. (And it should only contain .pyi files.)

FWIW if a better naming convention is standardized we'd be happy to amend PEP 561 by creating a new PEP.

PS. For @ethanhs only: perhaps it would be nice to add another example to the PEP of a complete package containing stubs using the -stubs naming convention?

ethanhs commented 6 years ago

Another twist is that mypy doesn't care about the distribution name

This is very true. That being said, people will likely search for e.g. django-stubs for a distribution that provides a django-stubs package (many people don't know there is a difference between distribution and package).

Otherwise, I agree with everything Guido says above.

dstufft commented 6 years ago

I don't know where Nick got that idea, but that's not how PEP 561 works. I recommend that you read it before getting too upset about it. Ethan also already explained this above.

There's still a risk of domain squatting of course, and users often download things without verifying them, but there's no code in mypy that ever goes and download anything, and I don't see any reason why we would ever do so.

Ok, sorry then. I haven't had time to read the PEP, if there's no automatic discovery of them, then there's no real security consideration to be had here and it's ultimately a usability/discovery problem and not a security one.

That almost makes me feel like the naming scheme at the distribution level might not be the right way to approach it, but rather some sort of metadata, because you may have multiple, competing type hints for a single package (and that's OK!).

So there are two sorts of problems that need solving here:

  1. With a convention to allow typing stubs to be placed into a third party package, projects may want to reserve a semi standard {something}-stubs name on PyPI for their own use, or to control who is able to distribute "official"-ish stubs for their project.
  2. A user of a particular library wants to discover what, if any, stubs are available for that library.

For (1), I think the current answer of "The project can pre-emptively register a {project}-stubs project" is an OK way of handling that, and future improvements like the Namespace reservation feature could mean that they would be able to request something like {project}-* without having to deal with registering a fake package. Unless there's a strong desire that the {project}-stubs package MUST be owned by {project} I'm inclined to say the status quo here is sufficient.

For (2), I kind of feel like some metadata is the best way to handle it. I see two real possible options, one that is easier, but less maybe less optimal, and one that is harder, but possibly better UX in the long run.

Thoughts?

ethanhs commented 6 years ago

For (1) I agree, reserving stubs should end up being compatible with project namespacing.

For (2) I do think changing the packaging tools makes a lot of sense. I expect people won't want to type out every *.pyi file they need to include in a package, so beyond including metadata, there is definitely some ergonomic improvements that can be made to setuptools around packaging PEP 561 distributions.

di commented 3 years ago

Update: the typeshed maintainers seem to be close to releasing their stub projects for third party packages: https://github.com/python/typeshed/issues/2491#issuecomment-765534751

It seems like from that thread they've landed on a types-<project_name> naming pattern, so this will work with our namespace plans.