psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
38.53k stars 2.43k forks source link

black could more efficiently skip directories listed in .gitignore #4405

Closed rmcloughlin closed 1 month ago

rmcloughlin commented 1 month ago

Say I have this file structure:

my-project
├── large_ignored_dir
│   ├── a.py
│   ├── inner
│   │   └── b.py
│   ├── inner2
│   │   └── c.py
│   └── inner3
│       └── d.py
└── z.py

The only file I want black to pay attention to is z.py.

Then I run black --exclude='large_ignored_dir/' --check -v . which shows me:

Identified `/` as project root containing a file system root.
Found input source directory: "[...]/my-project"
[...]/my-project/large_ignored_dir ignored: matches the --exclude regular expression
would reformat [...]/my-project/z.py

Oh no! 💥 💔 💥
1 file would be reformatted.

That seems to be the correct behavior.

But black is supposed to automatically ignore directories listed in .gitignore (docs). So if I add a .gitignore file containing this line:

large_ignored_dir/

Then I should be able to run black --check -v . and see the same result as above. But instead I see:

Identified `/` as project root containing a file system root.
Found input source directory: "[...]/my-project"
[...]/my-project/large_ignored_dir/inner ignored: matches a .gitignore file content
[...]/my-project/large_ignored_dir/a.py ignored: matches a .gitignore file content
[...]/my-project/large_ignored_dir/inner2 ignored: matches a .gitignore file content
[...]/my-project/large_ignored_dir/inner3 ignored: matches a .gitignore file content
would reformat [...]/my-project/z.py

Oh no! 💥 💔 💥
1 file would be reformatted.

Even though the final result is the same (only z.py is reformatted), in this second case black is wasting time checking files that it could have skipped. Instead of ignoring all of large_ignored_dir it is descending one level inside it and then deciding to skip every file and subdirectory.

This has performance consequences with directories like node_modules and __pycache__.

Environment

JelleZijlstra commented 1 month ago

PR welcome to improve this.

devdanzin commented 1 month ago

Interestingly, if the content of .gitignore is set to large_ignored_dir without a trailing slash, it works as expected:

Found input source directory: "[...]\my-project"
[...]\my-project\large_ignored_dir ignored: matches a .gitignore file content
[...]\my-project\z.py already well formatted, good job.

All done! ✨ 🍰 ✨
1 file would be left unchanged.

So maybe we should add a trailing slash to relative_path in _path_is_ignored() if the path is a directory?

https://github.com/psf/black/blob/721dff549362f54930ecc038218dcc40e599a875/src/black/files.py#L301-L316