tweag / FawltyDeps

Python dependency checker
Other
201 stars 14 forks source link

boto3-stubs not detected as (naively) expected #437

Open tungol opened 5 months ago

tungol commented 5 months ago

Describe the bug boto3-stubs is the type stubs package for boto3, and it's a bit of an oddball because it and several hundred mypy-boto3-* packages are programmatically generated from the same code base.

The issue at hand is that the documented way to use boto3-stubs relies on the extras mechanism to specify which AWS services you intend to use, rather then including all 300+ of them by default. In poetry's pyproject.toml format an example might be:

boto3-stubs = { extras = ["ec2"], version = "^1.22.2" }

Which means you'll have the base stubs for boto3, and boto3-stubs will pull in mypy-boto3-ec2 as a dependency. In my project, there's a couple places where I import some types from mypy_boto3_ec2 for annotation purposes, gating it behind a if TYPE_CHECKING statement.

Fawltydeps (correrctly) detects this as an import without a matching dependency. However, it's conceptually a little redundant to list mypy_boto3_ec2 on it's own, because the requested 'ec2' extra for boto3-stubs is pretty explicit already.

Expected behavior It'd be nice if fawltydeps understood how boto3-stubs uses package extras. That said, following package extras is probably the wrong thing to do in a lot of other cases, and I don't know if there's a way to handle that short of just special-casing it.

I think this issue is slightly similar to the case in https://github.com/tweag/FawltyDeps/issues/176 , but only in that boto3-stubs and mypy-boto3-* packages are conceptually a single unit while being separate packages in implementation. The specific namespace issues in that ticket have no parallel here.

I waffled on opening this ticket at all. I think there's a pretty reasonable case to made that taking the explicit dependency is the right thing to do when you get to the point of importing something. It did surprise me though, because I don't usually think of mypy-boto3-ec2 as a separate package from boto3-stubs.

tungol commented 5 months ago

Possibly related to https://github.com/tweag/FawltyDeps/issues/45 as well

jherland commented 5 months ago

Thanks for raising this issue, @tungol, it is indeed interesting.

My first thought was that we already have some code to deal with type stubs in FawltyDeps, but after a second look, it does not apply to your scenario (in fact, it only handles the ~opposite scenario)[^1].

However, with this precedence, we are not opposed to adding special handling for common type stubs patterns[^2], and I wonder if your case might be best solved with a type stubs-related special case. This could be something like:

If we don't make a special case, I think we would somehow need to accept that importing a transitive dependency is OK, and we may not want to open that can of worms: Not only would it loosen the principle of "always declare what you import", but also FawltyDeps currently does not look at transitive package dependencies at all, and I suspect that adding that functionality would carry both complexity and runtime overhead that we would rather avoid.

That said, maybe extras should not be counted as transitive dependencies?

On the one hand, when you say boto3-stubs = { extras = ["ec2"], ... }, you are indeed fairly explicit in your declaration, and it should not be lumped together with "vanilla" transitive dependencies (e.g. "package foo depends on requests, and therefore I can import requests without declaring it...").

On the other hand, you don't control the exact meaning of boto3-stubs[ec2], and a future version of this extra could e.g. alter the stubs naming scheme and leave your import mypy_boto3_ec2 high and dry.

Also, having to look at extras will carry some of the same complexity/overhead mentioned above.

I believe some more thinking is needed here. Is a type stubs-related special case the right way to solve this? Are there more general solutions that solve the problem without regressions or significant overhead?

[^1]: This will prevent an "unused" type stubs dependency from being reported when the corresponding non-stubbed name is imported. In your scenario this would correspond to you declaring boto3-stubs as a dependency, and your code only containing import boto3, but no import boto3_stubs. The rationale for this code is that you can declare a dependency on a type stubs packages, without intending to import that package (instead relying on your type checker to benefit from the type stubs being installed in your environment). FawltyDeps will thus not consider the type stubs package as unused.

[^2]: And for the existing type stubs handling in our code, it certainly seems like we should handle packages with a mypy- prefix the same as we handle those with a -stubs suffix.

tungol commented 5 months ago

_do you also import boto3ec2 elsewhere in your code?)

The relevant "real" import in this case is just boto3, which unlike the type stubs package does handle all AWS services out of a single package. So that wouldn't work as a check. (You can import boto3.ec2 but the most common pattern for using boto3 is import boto3; client = boto3.client('ec2') so that wouldn't be a helpful check either.)

As far as rules for type stubs go, I think that "if TYPE_CHECKING doesn't count" would be a better rule than "import names that look like type stubs" - there are reasons to use type stubs at runtime, and I'd expect those imports to be enforced by fawltydeps. That might be best with a flag to control strictness? I can easily imagine some projects wanting to validate TYPE_CHECKING imports and other projects not.

Honestly I've mostly talked myself around to "the right solution is the current behavior" on this.

jherland commented 5 months ago

I think I agree that a configurable include/exclude TYPE_CHECKING imports might be the way to go here. Not sure how much work that entails in terms of keeping track of TYPE_CHECKING conditionals when we traverse the parsed Python code.

I'll leave this issue as a P3 ("we'll get around to it at some point") for now (agree, @mknorps?), and suggest one of these workarounds for you, in the meantime:

mknorps commented 5 months ago

For context: a comment I made on 5th June, that was removed:

From my short exploration, the situation is following: Given that you have mypy-boto-ec2 installed, FawltyDeps is using the following code to establish what import names are provided by the package https://github.com/tweag/FawltyDeps/blob/main/fawltydeps/packages.py#L250:

from importlib_metadata import (
    _top_level_declared,
    _top_level_inferred,
)
...
            imports = list(
                _top_level_declared(dist)  # type: ignore[no-untyped-call]
                or _top_level_inferred(dist)  # type: ignore[no-untyped-call]
            )

In top_level.txt file of boto3-stubs there is only boto3-stubs name exported. We will have to investigate what is inferred in this case.