jendrikseipp / vulture

Find dead Python code
MIT License
3.45k stars 152 forks source link

Add a way to ignore the same files as `.gitignore` #344

Open leblancfg opened 9 months ago

leblancfg commented 9 months ago

When using the tool, I need to manually generate an exclusion list parameter that's essentially recreating what my .gitignore is doing.

Otherwise I end up running the tool on my virtualenv, egg-info, tox, etc. and the initial user experience is that the tool is too slow even on tiny codebases. It would be much simpler to use a flag that tells vulture to just exclude the same files as in the .gitignore – a common enough resource in most repos.

One might even go one step further and make that the default – like more modern shell tools like ripgrep and fd, but that's likely a separate discussion.

jendrikseipp commented 9 months ago

Yes, that sounds reasonable. I'd be happy to review a pull request that uses .gitignore to exclude files if --exclude is missing from both pyproject.toml and the command line. I suggest to basically copy the feature implementation from black: https://github.com/psf/black/pull/878 with the later patch https://github.com/psf/black/pull/2170/

jendrikseipp commented 9 months ago

Fixed in #345 .

leblancfg commented 9 months ago

Wow, that was quick. Thanks for doing this, @whosayn and @jendrikseipp

jendrikseipp commented 9 months ago

After further thought, I decided to remove this feature again. The --exclude flag and .gitignore file have different responsibilities and mixing them will surprise some users and will make documentation and maintenance difficult. Also, I'd like Vulture to have no external dependencies, so adding one just for parsing .gitignore files is not ideal. Finally, the feature raised some questions in my mind: why do we only parse the top-level .gitignore file? Why do we parse the .gitignore file in the current directory and not in the directory passed to Vulture?

I should have put the brakes on this feature earlier. Sorry about that!

leblancfg commented 9 months ago

@jendrikseipp if you don't mind me pushing a bit and seeing if there's possible middle ground, I think I can see a future where these concerns are handled:

The --exclude flag and .gitignore file have different responsibilities and mixing them will surprise some users and will make documentation and maintenance difficult.

By your previous approval here, I was actually expecting this feature to only be merged for the next major version change, clearly indicating to users that there was a breaking change.

It could always be available behind a CLI flag though, and only be opt-in. If users enter both a --exclude and e.g. --use-gitignore flags, we emit a warning on STDERR, something like

Both --exclude and --gitignore flags set, only using --exclude patterns.

I'd like Vulture to have no external dependencies

I think vulture could easily vendor the pathspec library for this purpose. The actual package code is tiny, only a handful of files. IIUC given the pathspec license (Mozilla Public License), that only means adding attribution to the vulture docs if we don't modify pathspec code. Also, it is a mature package so there is less risk of bugs, etc.

Why do we parse the .gitignore file in the current directory and not in the directory passed to Vulture?

IMO this is the default with the other tools out there, and sanely handled in the way you describe. I had a local PR I was intending to push up before #345 merged that handled that, using the same logic as black's find_project_root function.


Would you mind if I revived my local branch and gave another stab at this feature, given the above? If so, would you agree with the following?

  1. We vendor the pathspec dependency
  2. With a CLI flag called --gitignore, users can opt-in to using it as a source of exclusion patterns
  3. If --exclude and --gitignore are given, only use exclude patterns.
  4. Copy black's find_project_root logic to handle the common parent .gitignore of the relative paths.
jendrikseipp commented 8 months ago

Thanks for the detailed proposal! Before we go further here, can you (or others) provide real-world examples of projects where the new feature would be beneficial? I.e., what are example .gitignore files in the wild where all contained gitignore patterns should also be ignored by Vulture?

leblancfg commented 8 months ago

I see three main patterns that are encoded in .gitignore:

  1. "standard" locations for virtual environments, often .env/ and/or .venv/
  2. build/, *egg/ and dist/ folders
  3. testing artifacts (coverage, hypothesis, etc.)

I've found that these 5 popular repos have at least 2/3 these folders encoded in the .gitignores:

jendrikseipp commented 8 months ago

Thanks! But doesn't this suggest that it'd be better to ignore these directories by default, i.e., fill --exclude with these dirs by default?

However, ignoring these directories assumes that users call Vulture like vulture . in the project dir. Wouldn't it be much easier to have Vulture analyze the relevant files by passing them on the command line as in vulture src tests?

leblancfg commented 8 months ago

But doesn't this suggest that it'd be better to ignore these directories by default, i.e., fill --exclude with these dirs by default?

I agree with this statement. It's also the default behaviour for many codebase-focused tools, like black, ripgrep, etc. but comes with "change management" maintenance work, like assigning it to the next major version, clear documentation, etc. Two thoughts:

  1. I prefer your previous suggestion to keep the --exclude flag as an override to the .gitignore patterns, just like black does. Users need a way to override default behaviour. Also limits the breaking changes for users that already have the flag set.
  2. Selfishly as a user, I'd rather have the option to use gitignore as a source of exclusion patterns behind an opt-in flag than not at all.

Ultimately your call here @jendrikseipp, and one we can hopefully discuss in an upcoming PR.


Wouldn't it be much easier to have Vulture analyze the relevant files by passing them on the command line as in vulture src tests?

I guess this line of discussion boils down to aesthetics. Gitignores also handle more complex patterns like "exclude setup/ except for setup/main.py" etc. You could suggest users use something like:

vulture $(git ls-files --exclude-standard --others)

Personally, that's not very elegant or user-friendly – I don't think it's fair to assume users know git this well. I find it annoying working with tools (e.g. pylint) that force you to be this verbose to recreate allow/exclude patterns that are already encoded in a file that serves exactly that purpose.