jendrikseipp / vulture

Find dead Python code
MIT License
3.38k stars 148 forks source link

Vulture and its scope #271

Closed mflova closed 2 years ago

mflova commented 2 years ago

I recently configured my vulture in my IDE to input the git root directory instead of the current filename, as it seems that it scans all the dependencies and handles better the false positives than any other linter I am using, but I started to see some weird cases when I increased the scope. I decided to do a few tests with the CLI and observed some issues (which is a consequence explained in the "How does it work"in the readme)

Here I attach the minimum code and structure to reproduce the issue (vulture 2.3): Folder structure

.
└── folder
    ├── another_file.py
    └── file.py
# Content of file.py
import numpy
# Content of another_file.py:
import numpy
numpy.array([3])

Doing vulture folder/file.py I get the expected output: folder/file.py:1: unused import 'numpy' (90% confidence) However, when doing vulture folder/ it says everything is ok. Only when I change the name of the import in file.py is when this unused import is detected.

I understand that this happens because Vulture does not analyze the relationship between the files that are involved in the entire scope, so that if we have:

# Content of file.py
var = 2
# Content of another_file.py:
print(var)

vulture folder/file.py will say var is unused but vulture/folder will say it is OK. Therefore, reducing the scope increases the amount of false positives, but increasing the scope reduces them while increasing the false negatives. If I import numpy and use it in a single file of a repository, all the unused "import numpy" in the other files will not be detected as a consequence if I am not wrong.

Are there any plans to improve this? Thanks!

RJ722 commented 2 years ago

Given the nature of Python's dynamic nature, it is very difficult to have scope-aware analysis.

reducing the scope increases the amount of false positives, but increasing the scope reduces them while increasing the false negatives

@mflova, as you rightly pointed out, It's a tradeoff and we favored in the direction of false negatives because that's less noisy and more actionable. I don't see any plans to change this in the near future.