typeshed-internal / stub_uploader

Scripts and actions to auto-upload typeshed stubs to PyPI
Apache License 2.0
21 stars 16 forks source link

Allowing more external dependencies for stubs #90

Open hauntsaninja opened 1 year ago

hauntsaninja commented 1 year ago

Currently, we're quite strict about what external dependencies stubs can use. We have a small allowlist here: https://github.com/typeshed-internal/stub_uploader/blob/b81ba3c5214667f1dfb1decb0c97b83214553833/stub_uploader/metadata.py#L166

Briefly, the reason for this is a) Python packages can execute arbitrary code at install time, b) type checkers sometimes install stub packages automatically, c) stub packages are quite popular, d) users likely expect stub packages to be inert

So what would it take to remove the allowlist? We already have an important additional check in stub uploader: we ensure that external dependencies of a stub package must be a dependency of the upstream package. However, there's still a hole, in that stub dependencies of stub packages are not currently checked.

To spell things out, this is the scenario that we're concerned about:

Once we plug this hole, we could maybe get rid of the allowlist or have much more relaxed criteria.

This has been discussed in a couple places, mainly https://github.com/typeshed-internal/stub_uploader/pull/61#discussion_r979316520. I'm writing this up here as a way to easily communicate the current status quo to typeshed contributors.

Plugging the hole is a pretty easy change to make to stub_uploader, see diff here:

``` diff --git a/stub_uploader/metadata.py b/stub_uploader/metadata.py index 85d0643..56c5448 100644 --- a/stub_uploader/metadata.py +++ b/stub_uploader/metadata.py @@ -61,7 +61,7 @@ class Metadata: def requires_typeshed(self) -> list[Requirement]: reqs = self._unvalidated_requires_typeshed for req in reqs: - verify_typeshed_req(req) + verify_typeshed_req(req, self.upstream_distribution) return reqs @property @@ -145,7 +145,7 @@ def strip_types_prefix(dependency: str) -> str: return dependency.removeprefix(TYPES_PREFIX) -def verify_typeshed_req(req: Requirement) -> None: +def verify_typeshed_req(req: Requirement, upstream_distribution: Optional[str]) -> None: if not req.name.startswith(TYPES_PREFIX): raise InvalidRequires(f"Expected dependency {req} to start with {TYPES_PREFIX}") @@ -154,9 +154,26 @@ def verify_typeshed_req(req: Requirement) -> None: f"Expected dependency {req} to be uploaded from stub_uploader" ) - # TODO: make sure that if a typeshed distribution depends on other typeshed stubs, - # the upstream depends on the upstreams corresponding to those stubs. - # See https://github.com/typeshed-internal/stub_uploader/pull/61#discussion_r979327370 + if upstream_distribution is None: + raise InvalidRequires( + f"There is no upstream distribution on PyPI, so cannot verify {req}" + ) + + resp = requests.get(f"https://pypi.org/pypi/{upstream_distribution}/json") + if resp.status_code != 200: + raise InvalidRequires( + f"Expected dependency {req} to be accessible on PyPI, but got {resp.status_code}" + ) + + data = resp.json() + + if strip_types_prefix(req.name) not in [ + Requirement(r).name for r in (data["info"].get("requires_dist") or []) + ]: + raise InvalidRequires( + f"Expected dependency {strip_types_prefix(req.name)} to be listed in " + f"{upstream_distribution}'s requires_dist" + ) ```

However, typeshed currently has twelve stubs that fail this test:

``` test_recursive_verify[Pygments] - stub_uploader.metadata.InvalidRequires: Expected dependency docutils to be listed in Pygments's requires_dist test_recursive_verify[redis] - stub_uploader.metadata.InvalidRequires: Expected dependency pyOpenSSL to be listed in redis's requires_dist test_recursive_verify[PyScreeze] - stub_uploader.metadata.InvalidRequires: Expected dependency Pillow to be listed in PyScreeze's requires_dist test_recursive_verify[pyinstaller] - stub_uploader.metadata.InvalidRequires: Expected dependency docutils to be listed in setuptools's requires_dist test_recursive_verify[pysftp] - stub_uploader.metadata.InvalidRequires: Expected dependency paramiko to be listed in pysftp's requires_dist test_recursive_verify[slumber] - stub_uploader.metadata.InvalidRequires: Expected dependency requests to be listed in slumber's requires_dist test_recursive_verify[python-xlib] - stub_uploader.metadata.InvalidRequires: Expected dependency Pillow to be listed in python-xlib's requires_dist test_recursive_verify[JACK-Client] - stub_uploader.metadata.InvalidRequires: Expected dependency cffi to be listed in JACK-Client's requires_dist test_recursive_verify[setuptools] - stub_uploader.metadata.InvalidRequires: Expected dependency docutils to be listed in setuptools's requires_dist test_recursive_verify[PyAutoGUI] - stub_uploader.metadata.InvalidRequires: Expected dependency PyScreeze to be listed in PyAutoGUI's requires_dist test_recursive_verify[tzlocal] - stub_uploader.metadata.InvalidRequires: Expected dependency pytz to be listed in tzlocal's requires_dist test_recursive_verify[D3DShot] - stub_uploader.metadata.InvalidRequires: Expected dependency Pillow to be listed in D3DShot's requires_dist ```

So it may be that there's some reasonable improvement that could be made to the check that works for these twelve packages. Or maybe we can just require these specific stubs to have individualised exceptions committed to stub_uploader. Or maybe it's not super viable to plug this hole and we need to keep the allowlist forever.

(Also note that the implementation of the check in that diff^ isn't perfect, since it only works for projects that are on PyPI and have wheels, and only checks the dependencies of the latest version)

Avasam commented 1 year ago

I feel like this is related. In https://github.com/python/typeshed/pull/9511 I tried to use lxml-stubs, but can't because it's not explicitly mentioned as a dependency of openpyxl. Only et-xmlfile is (which itself is based on the is based upon the xmlfile module from lxml, but that's besides the point).

Seeing https://github.com/python/mypy/pull/14737 makes me wanna request allowing those for stub_uploader as well, but that's only growing the list. (unless you wanna directly depend on, and trust, mypy's non_bundled_packages)


On the note of more relaxed criteria and catching more non-wheel declared dependencies without running arbitrary code, maybe https://github.com/typeshed-internal/stub_uploader/pull/88 would help ? It still won't catch all cases (including mine mentioned above), but I already know of at least two cases in typeshed were it would be beneficial.

srittau commented 1 month ago

Currently, EXTERNAL_REQ_ALLOWLIST is a free-for-all. I.e. once a package is on the allowlist, every other package can depend on it. It would be safer to specify exactly which stub packages can depend on which other packages. This would also allow us to be a bit looser in some cases. For example, the implications of types-obscure-pkg depending on other-obscure-pkg are lower than types-requests being able to depend on obscure-pkg.

We should also consider moved the allowlist out of the metadata.py file into a data file to separate code from data.

srittau commented 1 month ago

For example, the implications of types-obsure-pkg depending on other-obscure-pkg are lower than types-requests being able to depend on obscure-pkg.

Although there is the danger of a hacked maintainer account adding a dependency on types-obscure-pkg to types-requests. This case needs to be considered.

srittau commented 1 month ago

Okay, here is a suggestion. We allow:

Edit 2: Also, on upload of a package with outside dependencies, we reverse check whether there are other typeshed packages that depend on this that don't have the required outside dependencies in their allowlist.

AlexWaygood commented 1 month ago

A dependency on other typeshed packages if those only have (recursively) no outside dependencies.

To illustrate the implications of this proposal: currently 6 typeshed packages depend on types-requests:

All of these currently pull in urllib3 due to types-requests depending on urllib3, but only types-docker explicitly lists urllib3 as an external dependency in its METADATA.toml file.

To clarify the proposal here: would we still allow these typeshed packages to depend on types-requests if (and only if) they explicitly also listed urllib3 in their requirements, and we had urllib3 in their per-package allowlists at stub-uploader?

srittau commented 1 month ago

To clarify the proposal here: would we still allow these typeshed packages to depend on types-requests if (and only if) they explicitly also listed urllib3 in their requirements, and we had urllib3 in their per-package allowlists at stub-uploader?

At least the latter, yes. They don't necessarily need to list it in their METADATA requirements, as there's no necessity to make it into their dependency field in the package.

AlexWaygood commented 1 month ago

I can see the attraction of making allowlists per-package. The way that urllib3 has become an indirect dependency of so many stubs packages shows how easy it is for packages to become significant attack vectors (without us realising) once they've been added to the allowlist. For the specific case of urllib3, the ecosystem is probably going to have much bigger problems than the security of typeshed packages if that package is compromised in some way -- but it's a useful example. And as you say, if allowlists are per-package, then adding an entry to the allowlist doesn't have nearly the same implications in terms of security.

It will probably mean more busywork for us to maintain per-package allowlists in a separate repo, but I think it's probably worth it.

srittau commented 1 month ago

We could also add back a general allowlist for very popular packages in addition to the per-package allowlist if this proves to be too much busywork at a later date. If we put the allowlist into an external file, we should keep that possibility in mind when deciding on a format.