psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
39.12k stars 2.47k forks source link

Black does not honor exclude regex when files explicitly listed on the command line #438

Closed adamehirsch closed 4 years ago

adamehirsch commented 6 years ago

Operating system: OSX Python version: 3.6.2 Black version: black, version 18.6b4

The problem: certain directories in our repo contain generated python code that we don't want black to change. We've configure our repo to run black via pre-commit. Pre-commit invokes black with a list of changed files on the command line, and black's exclude regex does not work against those files and paths.

i.e.

black --exclude "/migrations/" content/migrations/0049_publicationstore_is_test.py
reformatted content/migrations/0049_publicationstore_is_test.py
All done! ✨ 🍰 ✨
1 file reformatted.

This makes us sad, since we've carefully put exclusion regexes into our pyproject.toml and black doesn't honor them when pre-commit calls it. Instead, we're having to workaround by configuring pre-commit to skip that path:

repos:
-   repo: https://github.com/ambv/black
    rev: stable
    hooks:
    - id: black
      language_version: python3.6
      exclude: migrations

The behavior we'd like to see is that black's exclude regex would apply even when full file paths are listed on the commandline. I'd be happy to try for a PR if this seems like desirable behavior to anyone else...

asottile commented 6 years ago

From a prior art perspective, flake8 has this same issue (marked: WONTFIX) -- that said, I personally think the decision in flake8 is incorrect/inconsistent and that implementing this is a good idea :)

adamehirsch commented 6 years ago

I'll wait for the black maintainers to indicate whether they think it's a good idea; no sense writing a PR for something that'll be rejected on concept.

ambv commented 6 years ago

--include= and --exclude= are only consulted for recursive search, not for files passed on the command line.

How is the decision of flake8 inconsistent here? The rationale is rather simple: by default, we exclude some paths and only include some file extensions in recursive search. But if you specifically give us a file path on the command line which doesn't match the file extension or would otherwise belong to the exclusion list, you probably know what you're doing.

I do agree that interfacing with CI and editors is surprising in this case so I'm not downright rejecting changing this. But I need to carefully consider whether there are backwards compatibility disasters hiding in changing this. And even if we agree to do this, what does it mean for defaults in --include=? Should I reject non- .py/.pyi files from now on unless somebody clears --include= on their call?

This is not going to be straight-forward to change but let's try to figure something out. Anthony, what would you suggest?

asottile commented 6 years ago

ah let me clarify -- black and flake8 currently have the same behaviour here (they are consistent).

Maybe it's a bit snowflakey but intuitively it makes a lot of sense for me to apply --exclude even if passed on the commandline. This simplifies a lot of editor configurations, pre-commit, and even just black foo/*.py. I don't think --include should be applied except in the recursive case however.

That said, there's certainly arguments in both directions -- it may very well be simpler to only apply it during the recursive routines.

the one thing I usually point at here is "pre-commit is better at running your linter than your linter is" because it can take advantage of a few things:

Though for tools that often means you have to configure both pre-commit exclusion and tool exclusion and keep them in sync (or a superset / subset of each other). It would be nice to only configure this in one place and for most that usually means "configure it using the tool's configuration". But then you run into OP's issue :)

bisby commented 6 years ago

My confusion stemmed from the following line from black's own readme on pre-commit:

Avoid using args in the hook. Instead, store necessary configuration in pyproject.toml so that editors and command-line usage of Black all behave consistently for your project.

I know this specifically says "args", but I took this to mean "pyproject.toml is preferable to pre-commit's yaml when both have a similar config." It made sense at the moment (without having fully grasped the concept of how everything worked), that since black . worked with the exclude in black's config, that it should work with pre-commit run black as well.

If the answer is "exclude will only ever exclude on recursive searches" that's fine, but the readme should note that in the pre-commit section ("pre-commit passes specific files, not a recursive search, so you should exclude files in pre-commit as well"). But I would fully support some sort of "--hard-exclude" that is a definitive "black will not run on these files, no matter what"

ambv commented 6 years ago

We'll definitely hard-exclude things that are .gitignore'd (see #475). Other than that, I don't think there's anything actionable here. If I pass a file explicitly to the tool, I expect it to be acted on.

What I think we could do instead is to convince @asottile to let pre-commit run black on the entire Git repo on every commit. The main reason it's not doing so is performance as far as I can tell. But Black does cache its executions so it won't even touch files that are already well-formed. Plus, then it will correctly follow exclusion and inclusion lists. This would minimize configuration for users, too.

bisby commented 6 years ago

Would a PR with an update to the readme be warranted then?

Avoid using args in the hook. Instead, store necessary configuration in pyproject.toml so that editors and command-line usage of Black all behave consistently for your project. See Black's own pyproject.toml for an example.

This heavily implies that pre-commit should not have any config (although it admittedly just calls out args specifically, as a first time user this was confusing). Some clarification that pre-commit will call files explicitly, thus will ignore black's exclude list would be nice.

ambv commented 6 years ago

I'd like to wait for what Anthony thinks about it. If he agrees that we should change the hook setup so that Black is ran on the entire repo every commit, then we will do just that and the README then remains fine.

Otherwise, yeah, we will have to explicitly call out in the README that --exclude= in pyproject.toml is not used by pre-commit. I would like to avoid this.

asottile commented 6 years ago

first, it's possible to enable this behaviour but I don't think it's desirable (as it sidesteps the benefits of the framework):

    -   id: black
        args: [.]
        pass_filenames: false
        always_run: true

pre-commit is (generally) better at running linters than linters themselves are, here's a couple of reasons why:

An example of a case where always running black on all the files is (very) wrong is during a merge conflict resolution. pre-commit will only execute a linter on files which conflict or are manually changed avoiding the headache of waiting for black (or other linters) to run across every file that changed in the upstream (and potentially dealing with other people's mistakes as a punishment for merging).

(gotta run, hope this succinct reply is enough, if not I can elaborate / link some more prose on this -- hope it helps!)

alvinlindstam commented 5 years ago

When using an editor integration that calls black with a changed file's path (which it could do automatically, such as on a post save action), this behaviour also means that it would reformat the file even if it's excluded by the config.

So I would be in favor of either changing the default to always consider the ignore/exclude rules, or to include an option to do so even when a full path is provided.

amitbeka commented 5 years ago

Hi everyone,

I also run into this problem with multiple editors and this behavior was surprising to me, as a user.

IMHO you don't want to have multiple configurations stating the same files to exclude: pyproject.toml, pre-commit (not everyone using the tool named pre-commit), VIM, PyDev, IntelliJ etc.

We have many tools in the team so it's essential one configuration will be used by everyone.

I think that in terms of usability, surprising the user is never a good idea. We can add two flags (tentative naming) to help black behave nicely when an excluded file is given to black as an argument:

  1. -f/--force to force formatting an excluded file, return 0 (or whatever we return from the formatting)
  2. --ignore-excluded which means black silently ignore the excluded files, formatting the other files (if any) and return 0 (assuming the rest was fine)
  3. Without flags, return an error (e.g. -1) and warn that excluded files were given explicitly

If this change of behavior is agreed upon, I can try and create the PR.

chebee7i commented 5 years ago

Any progress/decisions on this?

zsol commented 5 years ago

I agree with Anthony that pre-commit is in a better position to figure out when to run Black on what, so if we could somehow get away with only ever running Black through pre-commit, that would provide a way to solve this problem.

I don't think it's OK to assume Black is always going to be called through pre-commit though, so we do need a separate include/exclude mechanism (and unfortunately we do need to worry about parsing .gitignore et al) for when it's called directly from the command line or through editors.

As for the performance concerns around running black . on the entire repo every time you commit: after the first run, it should be O(changed files) instead of O(repo) because of Black's cache.

So I think all in all we should encourage people to configure Black with the following in their .pre-commit-config.yaml

    -   id: black
        args: [.]
        pass_filenames: false
        always_run: true
asottile commented 5 years ago

please don't suggest that per above, thanks

even with black's cache it's going to be the wrong thing during merge conflicts and you're back to linting files that aren't checked in and you have to do filesystem traversals which themselves are pretty slow

given how frequent this comes up I'm considering changing flake8's behaviour to honor flake8 foo.py --exclude foo.py to be a noop (as silly as such a command is) and perhaps if that goes well black should do the same

asottile commented 5 years ago

plus, even a trivial invocation of black (with no files) takes ~250ms which would be 0ms in the case when there's no files to run

chebee7i commented 5 years ago

https://github.com/psf/black/issues/438#issuecomment-495238640 is the closest to a proposal here, but I'm not sure that we need extra parameters. I think all we need to do is change the behavior of --exclude:

Desired Behavior:

Use cases:

The latter use case is important since during merge conflicts, we will only run on the files that the user edited, rather than on every file that anyone has edited.

asottile commented 5 years ago

weird I thought I mentioned this but I guess not, if you're only invoking through pre-commit it's usually better to use pre-commit's exclude: ... pattern:

repos:
-   repo: ...
    rev: ...
    hooks:
    -   id: black
        exclude: ^testing/test_data/

or if you're globally excluding


repos:
exclude: ^vendor/
-   repo: ...
    rev: ...
    hooks:
    -   id: black
chebee7i commented 5 years ago

This works for me (in my limited use cases):

diff --git a/black.py b/black.py
index 05edf1a..05ba5af 100644
--- a/black.py
+++ b/black.py
@@ -441,7 +441,23 @@ def main(
             )
         elif p.is_file() or s == "-":
             # if a file was explicitly given, we don't care about its extension
-            sources.add(p)
+            try:
+                normalized_path = "/" + p.resolve().relative_to(root).as_posix()
+            except ValueError:
+                if p.is_symlink():
+                    report.path_ignored(
+                        p, f"is a symbolic link that points outside {root}"
+                    )
+                    continue
+
+                raise
+
+            exclude_match = exclude_regex.search(normalized_path)
+            if exclude_match and exclude_match.group(0):
+                report.path_ignored(p, f"matches the --exclude regular expression")
+                continue
+            else:
+                sources.add(p)
         else:
             err(f"invalid path: {s}")
     if len(sources) == 0:
itajaja commented 5 years ago

I have added a PR following @chebee7i proposal, which I think is the most sensical https://github.com/psf/black/pull/1032. I think that black needs to work well when integrating with other tools (editors, pre-commit hooks) for larger adoption. this concerns probably were less of a thing some years ago when flake8 was developed, but if we look at other modern tools and ecosystems (node: prettier, eslint, husky) this is the expected behavior.

itajaja commented 5 years ago

this is the 4th top commented issue in the repo, it would help a lot of people if some attention could be spared on this. There is an open PR, so my question is why this is not moving forward? a) simply no time to look into this? b) won't fix? c) something else?

They are all acceptable answers :) but I'd like to set some expectations for myself

asottile commented 5 years ago

@itajaja well for one your open PR has no tests and has a merge conflict -- I also don't think there's been any concrete proposal / agreement on the way to move forward yet

itajaja commented 5 years ago

it has a merge conflict because is a month old, and I wouldn't keep rebasing if there was no interest. For tests, as I mention in the PR:

I Have not added or amended tests yet, but if we agree on the approach, I can work on them

so, yes, it was a way to throw a more concrete proposal and discussion. Sometimes is better to actually do a PR than discuss on the issue. you say that there hasn't been a concrete proposal, I'd say it doesn't get more concrete than a PR 😄

so, yes, what's missing is an agreement, and for that we need core maintainers and contributors to chime in. I am just trying to bring attention on an issue that seems fairly popular.

Thanks!

itajaja commented 5 years ago

asking again. I think the ball is in the core contributor's court to decide what to do, or at least give an update. Any answer is legitimate (won't fix, no time for this, alternative proposal) but at least an answer will help people set expectations :v: This is something that needs to come from core contributors. What we can do is give awareness that this is a problem for many people, but we need guidance in order to solve this.

iutlu commented 4 years ago

Hello all, I'm using VSCode and would also like to see this option come to Black for a more correct format-on-save behaviour. Maybe there could be an optional boolean flag like --force-hard-exclude (similar to isort's --filter-files) which flips the behaviour to unconditionally honoring the exclude list regardless of the command line arguments.

And thanks for the fantastic tool! :slightly_smiling_face::+1:

itajaja commented 4 years ago

it's really sad to see no answer at all from maintainers. we simply forked with the least intrusive modification and moved on with our lives...

audiolion commented 4 years ago

To echo a previous comment, blacks PyCharm editor integration suggests setting up with the FileWatchers plugin to run black whenever a file changes. Because --exclude does not work when passing a file directly, every time you install something into a .venv in your project directory, black goes through every file and formats it 😮 !

FileWatchers doesn't provide a way to exclude directories from being watched (an issue in its own right), but it leaves anyone integrating with PyCharm and Black at an impasse. Let black format every installed package, clocking your CPU on pip installs, or don't use the watcher integration and use pre-commit instead.

I understand the hesitancy to introduce breaking changes by modifying this behavior. It seems the only work around is to add a CLI feature flag that opts into the new behavior, and warns people relying on the existing "format direct file paths even if excluded behavior" that it is deprecated and the default will change in a future release.

@itajaja could you provide a link to the fork?

itajaja commented 4 years ago

@audiolion https://github.com/psf/black/pull/1032

audiolion commented 4 years ago

Ah ok so you didnt publish the fork to PyPI but are doing an install from github directly?

itajaja commented 4 years ago

yep

c0state commented 4 years ago

FileWatchers doesn't provide a way to exclude directories from being watched (an issue in its own right)

I think you can configure a scope on your file watcher configuration that excludes the .venv dir (and any others that shouldn't be touched).

audiolion commented 4 years ago

@c0state for the life of me I couldn't find way to do that, or docs from jetbrains on it.

c0state commented 4 years ago

@c0state for the life of me I couldn't find way to do that, or docs from jetbrains on it.

See https://www.jetbrains.com/help/idea/settings-scopes.html. All file watchers have a scope. The default is "Project Files" I believe which will probably format stuff you don't want. I usually set one up that includes my project files and then explicitly excludes venv/.venv, build, etc. dirs and such. If all your projects look the same (structurally), you can make this scope global and use it in all your projects, otherwise you can set this up manually for each project.

steinsag commented 4 years ago

This is not a thing to discuss, but it is simply a bug of black. If I have a repo containing other file types than Python, I can't add a pre commit hook executing black on all files to be committed. That's because black can't format e.g. markdown files.

Usually, a pre-commit hook is done with something like:

git diff --name-only --diff-filter=ACM

Now I could add additional greps to only include py files, etc., but I already specified all of that in my pyproject.toml. I think in our project we must now maintain a fork :-(

itajaja commented 4 years ago

@steinsag if you use something like pre-commit it's easy to configure black to run only on python files.

steinsag commented 4 years ago

For those in need of getting a fixed version, my colleague created a patched version https://github.com/axiros/axblack, which is also available on pypi https://pypi.org/project/axblack/. Please note, axblack uses single quotes instead of double quotes.

zsol commented 4 years ago

I'm considering supporting the idea of applying --exclude and --include for files explicitly passed on the command line.

There's one big concern with this approach I've seen above: black file doesn't do anything because file is either not included (in either pyproject.toml or the default \.pyi?$ pattern) or it's excluded. I see this as a problem in these two cases:

The issue is that this file-filtering configuration is implicit to some degree. It would be OK if everything is explicitly on the command line:

I think this is a big problem because of --include: the defaults for it cannot capture (what I think is) a fairly common use case. --exclude is probably fine IMO.

So how about being pragmatic about this, swallowing the inconsistency and saying: --include applies only to traversing directories (like today) and --exclude is applied to all command line arguments? @ambv have you changed your opinion?

asottile commented 4 years ago

I echo your concern -- it's probably less surprising if only --exclude is applied

thejcannon commented 4 years ago

Not to muddy the waters any. I see it looks like a consensus is forming around having the exclusion rule also apply to file passed in via the command line (and I hope this, too, will go for flake8).

But thought I'd point out isort "fixed" this by adding another flag filter_files, which says

Tells isort to filter files even when they are explicitly passed in as part of the command. This is especially useful to get skip and skip_glob to work when running isort through pre-commit.

I don't think @ambv has weighed in on his opinion of another command-line flag (which I see has been brought up), which solves this problem in a backwards-compatible way (since @ambv did mention backwards-compatibility is something to not take lightly).

ambv commented 4 years ago

I like the idea of --filter-files. Let's name it --force-exclude.

ambv commented 4 years ago

@itajaja, thanks for persevering with this issue. If you made your pull request to use --force-exclude, I will gladly merge it.

itajaja commented 4 years ago

🙏 I'll take a stab!

itajaja commented 4 years ago

@ambv and here's the stab 🔪 https://github.com/psf/black/pull/1032 All tests pass, so we are pretty confident it's backward compat. If this looks good, I'll add tests, but let me know what tests you think would be useful

itajaja commented 4 years ago

uhm...

vishalkuo commented 4 years ago

Can we get the above PR reviewed? The change seems pretty small and it seems like it would benefit a fair number of folks.

Gricha commented 4 years ago

@ambv Hey Łukasz :) Sorry to bother you - we actually hit that very specific issue and taking a stab at this PR would really help us out!

fkromer commented 4 years ago

@vishalkuo Seems it's time to fix the pylint build job probably with a pylint single line exception first :wink:

 black.py:274:1: C901 'main' is too complex (20)

Would be great to have some solution to this issue soon. It's quite annoying.

shrutimansinghka commented 4 years ago

black --check --exclude=.migrations/ --diff app-code > black.diff This seems to work for me.

ichard26 commented 4 years ago

We'll definitely hard-exclude things that are .gitignore'd

https://github.com/psf/black/issues/438#issuecomment-424803691

~Currently due to a bug (#1572), this works as described 🙃 . Even if you explicitly pass a file to Black, it will still ignore the file if it's gitignore'd. Unfortunately, this bug also breaks the behaviour of --exclude, causing it to act like --force-exclude so it has to be fixed.~ This is no longer the case as #1591 fixes this bug, but the rest of this comment still applies.

@ambv are you still in favor of having Black always refuse to format files that are gitignore'd, or should we stay with current ~go back to stable~ behaviour where .gitignore only applies to files found recursively? That idea of yours goes back almost two years ago, so you could be no longer in favor of it.