Open kkirsche opened 1 year ago
Just to clarify, the desired end result is that when I make a typeshed PR that touches the requests
stubs, the mypy-primer output will say something like "15 packages checked by mypy-primer use requests
". Is that right? If so, that seems like a useful enhancement.
The standard API for retrieving the dependencies of a package is importlib.metadata
: https://docs.python.org/3.10/library/importlib.metadata.html#distribution-requirements
Just to clarify, the desired end result is that when I make a typeshed PR that touches the
requests
stubs, the mypy-primer output will say something like "15 packages checked by mypy-primer userequests
". Is that right? If so, that seems like a useful enhancement.
Correct. The intent is to ensure that reviewers have the additional context about whether the output of mypy_primer is applicable, a small signal, or a strong test case for a change.
The standard API for retrieving the dependencies of a package is
importlib.metadata
: https://docs.python.org/3.10/library/importlib.metadata.html#distribution-requirements
Thanks for the correction / additional detail(s). I've had some PRs rejected for other tools for using importlib so I must have just skipped over it due to compatibility or other historic reasons that don't apply here.
Might be a little fiddly... mypy_primer avoids installing most of the projects it checks. This keeps things relatively faster and mostly avoids all the various build related issues that would otherwise arise. This means options 1 and 2 wouldn't work. I'd probably go with 4. You may also want to use project.source_paths
to get exactly the set of paths that a mypy invocation would look at.
I'd also recommend not doing this as part of the core mypy_primer logic. Instead we could make this its own command, like mypy_primer --coverage
, mypy_primer --measure-project-runtimes
, etc. This could simply spit out: X projects use types-xyz
or whatever. The advantage of a separate command is it would interact better with the sharding we do in CI and the use of mypy_primer for mypy CI (as opposed to typeshed).
Out of curiosity, are there PRs where you feel like having this would have resulted in some different outcome? Also one more thing in this space... It could be useful to run something mypy_primer-like on the tests for the untyped project. This is probably best done by using mypy_primer like a library.
Good morning,
This issue is to add support for detecting dependencies of the project(s) being scanned by MyPy.
Use Case
The use case of this feature is to understand the impact of a scan better when evaluating the results in typeshed pull requests.
Behavior
The recommended behavior of
mypy_primer
is to add support for an optional argument, either positional or flag-based, which accepts one or more package names. These package names represent the package being evaluated, such astypes-requests
. As typeshed packages are published under the patterntypes-{package}
, this would be used to determine which package was modified in this change.With this change implemented and a package provided, while
mypy_primer
is scanning individual packages, it will evaluate whether or not the package being scanned uses that dependency, providing the end user with a percentage of projects scanned that use this dependency. Ifmypy_primer
supports a verbose run mode, this will instead provide a list of scanned packages with each package's individual status.Enhancements
This behavior can be enhanced, at the cost of additional complexity, by evaluating the package using a coverage-focused approach, determining if the changed APIs in a pull request are used within the package rather than simply looking for the dependencies.
Approaches
There seem to be a few different approaches we could take for this, depending on the longer-term intent of a feature like this. I've listed the three that immediately come to mind.
modulefinder
(not recommended)modulefinder
can execute individual scripts locating dependencies used by that. This can be used to scan individual package files, evaluating which dependencies are used by it.modulefinder
achieves this behavior using an import_hook.pip
search_packages_info
to retrieve therequires
field of the project's metadata.mypy_primer
is working with the projects, it may instead make sense to read metadata from the project's configuration files (such aspyproject.toml
,setup.py
,setup.cfg
, etc.)There certainly may be more approaches, I'd be interested in any feedback you may have about what approach you feel makes the most sense.
Who Will Do This?
I'm happy to attempt to provide this, though there will be some delays as I am currently assisting my family with something offline. This is why I haven't been able to be as involved in typeshed as I would like following my discussion with @AlexWaygood.
Thank you for your time.