tweag / FawltyDeps

Python dependency checker
Other
201 stars 14 forks source link

Third-party modules confused with nested first-party modules that share the same name for `check-unused` #419

Open jharrisonSV opened 10 months ago

jharrisonSV commented 10 months ago

Describe the bug When I structure a repo with nested modules that share the same name as a third-party dependency I observe weird behaviour when running fawltydeps --check-unused. Say I structure my repo as follows (full demo at https://github.com/jharrisonSV/fawltydeps-test):

app
|--- fastapi
|      |--- __init__.py
|      |--- main.py
...

And in app/fastapi/main.py I import from the FastAPI third-party dependency like from fastapi import FastAPI. FawltyDeps complains that fastapi is unused.

If I rename my nested app/fastapi module to anything else FawltyDeps doesn't find an issue. So this must be some sort of problem with finding a distinction between first- and third-party modules.

To Reproduce Reproducible example: https://github.com/jharrisonSV/fawltydeps-test.

Expected behavior FawltyDeps should recognise that from fastapi import FastAPI is a third-party import and not complain that fastapi is unused.

Environment

Additional context I had a look through other issues and I think this might be a corner case for #95

jherland commented 10 months ago

Thanks for reporting! We'll look into this as soon as time permits.

jherland commented 10 months ago

I've started digging some more into this issue, and I agree that the distinction of first- vs third-party modules is at the core here. In short, the fix I want to make boils down to this:

diff --git a/fawltydeps/extract_imports.py b/fawltydeps/extract_imports.py
index c59ebd9..fde7397 100644
--- a/fawltydeps/extract_imports.py
+++ b/fawltydeps/extract_imports.py
@@ -192,7 +191,7 @@ def parse_source(
         if src.base_dir is None
         else make_isort_config(
             path=src.base_dir,
-            src_paths=tuple(dirs_between(src.base_dir, src.path.parent)),
+            src_paths=(src.path.parent,),
         )
     )

which means that when finding first-party imports, we consider only the project root (src.base_dir) and the immedaite parent dir of the source file. We should no longer include other intermediate directories between the project root and the source file. I believe this should more closely match how Python itself resolves imports (by default).

I believe the reason we added more first-party directories to isort's config was to handle some projects that manipulate sys.path while running in order to find imports located in such intermediate directories. One such project, which we are testing under tests/real_projects/detect-waste.toml is detect-waste, which contains stuff like this:

sys.path.append('./efficientdet')
sys.path.append('./classifier')

These statements are not otherwise parsed/understood by FawltyDeps, so when I change the behavior as suggested above, the first-party imports from these subdirectories will be seen as third-party imports by FalwtyDeps. I.e. for this specific project, the change I want to make will be seen as a regression.

I don't have a patch ready yet, for a couple of reasons:

jherland commented 8 months ago

Sorry for the long delay here. I've started looking at this again, and I still want to make the changes outlined in the previous comment. However, I'm still struggling to make isort classify imports "correctly" (by which I mean, in the same manner as Python's import resolver). I've opened https://github.com/PyCQA/isort/issues/2247 to learn if it's just a matter of me not using it correctly.