psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
38.71k stars 2.43k forks source link

Use optional parentheses more often #2156

Open cdce8p opened 3 years ago

cdce8p commented 3 years ago

Describe the style change

For short if statements (just one and/or) black tries to get rid of optional parentheses whenever possible. One common result of that behavior is that the line is instead spilt on a function call which often reduces the readability compared to optional parentheses. Even if the optional parentheses are inserted manually, black will reformat the code.

Examples in the current Black style with desired style

Some examples taken from the pylint code base.

if not is_node_in_type_annotation_context(node) and isinstance(
    node.parent, astroid.Subscript
):
    pass

# better
if (
    not is_node_in_type_annotation_context(node)
    and isinstance(node.parent, astroid.Subscript)
):
    pass
def _supports_mapping_protocol(value: astroid.node_classes.NodeNG) -> bool:
    return _supports_protocol_method(
        value, GETITEM_METHOD
    ) and _supports_protocol_method(value, KEYS_METHOD)

# better
def _supports_mapping_protocol(value: astroid.node_classes.NodeNG) -> bool:
    return (
        _supports_protocol_method(value, GETITEM_METHOD)
        and _supports_protocol_method(value, KEYS_METHOD)
    )
                    if getattr(
                        func, "name", None
                    ) == "iter" and utils.is_builtin_object(func):

# better
                    if (
                        getattr(func, "name", None) == "iter"
                        and utils.is_builtin_object(func)
                    ):

Additional context For any statement with more at least two priority delimiters the above suggested syntax is already the default.

return (
    is_postponed_evaluation_enabled(node)
    and value.qname() in SUBSCRIPTABLE_CLASSES_PEP585
    and is_node_in_type_annotation_context(node)
)

One possible implementation could be the modify these lines here and use > 0 instead of > 1. https://github.com/psf/black/blob/76e1602d60391cca1bb3b21a9ef6ed07a1bca17d/src/black/__init__.py#L6751-L6754

friendyou1 commented 3 years ago

What is HTML ?

ichard26 commented 3 years ago

@friendyou1

From a quick Google Search the summary from Wikipedia says the following:

The HyperText Markup Language, or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets and scripting languages such as JavaScript.

i.e. it's the language that defines webpages (to be precise their content and structure), this page you're reading is made up of HTML. HTML is often used with JS (for interactivity) and CSS (to style and design the page so it looks nice)

Finally, please avoid posting irrelevant comments, it's noisy on a busy issue tracker. Thank you!

cdce8p commented 3 years ago

If someone is interested, I've opened #2163 which would address it.

felix-hilden commented 3 years ago

The linked PR had positive feedback from maintainers but was rejected because of the disruption it would cause. So should this issue be closed by extension, or is there still a discussion to be had about this suggestion?

Marc did ask if keeping user-inserted parentheses would be an option. In my opinion the style would indeed be an improvement. But is it inappropriate with respect to Black's philosophy or reasonable pragmatism?

cdce8p commented 3 years ago

I've created a followup PR #2237 that would keep user inserted optional parentheses

felix-hilden commented 3 years ago

Based on my recent ride around the issue tracker, it is evident that this is a highly-requested feature. I'll summarise Łukasz' remarks when closing the initial PR:

Łukasz ended with saying he's also disappointed at the fact that Black is not internally consistent in this regard.

If I may, I'll try to push this closer to a resolution once more.

Keeping user-inserted parentheses

I agree that this is a suboptimal solution, since it is akin to the magic trailing comma and comes with all the similar problems.

Matching the way people write code

I think this is not that important given Black's function. We ruthlessly splits lines, change quotations and add or remove other parentheses. Users "cede control over formatting" to Black, they shouldn't expect Black to improve their current desired (lack of) style.

Disrupting existing code

This is the biggie. If we can agree on a good set of rules to actually do this, it could be tested on the primer projects to see exactly how much disruption it would cause. But getting too mathematical is not the point either.

Is the proposed style better?

There seems to be no question that this is the better style in the examples that were given:

if not is_node_in_type_annotation_context(node) and isinstance(
    node.parent, astroid.Subscript
):
    pass

# VS

if (
    not is_node_in_type_annotation_context(node)
    and isinstance(node.parent, astroid.Subscript)
):
    pass

But it is not that obvious in more complex cases like in our own tests (example taken from the original PR test code modifications):

if prevp.parent and prevp.parent.type in {
    syms.typedargslist,
    syms.varargslist,
    syms.parameters,
    syms.arglist,
    syms.argument,
}:

# VS (what cdce8p wrote)

if (
    prevp.parent
    and prevp.parent.type
    in {
        syms.typedargslist,
        syms.varargslist,
        syms.parameters,
        syms.arglist,
        syms.argument,
    }
):

# VS (another possibility, which I kinda like)

if (
    prevp.parent
    and prevp.parent.type in {
        syms.typedargslist,
        syms.varargslist,
        syms.parameters,
        syms.arglist,
        syms.argument,
    }
):

But the problem is more general than if statements. I closed lots of similar issues above, but I feel like even #236 and #348 are closely related to this. So I think we should really come up with a careful set of rules, or better yet: examples of what parentheses should look like if we're going to continue pursuing this any further.

To that end, I'd like to ask from the maintainers and especially @ambv: has the ship sailed? Are you absolutely opposed to any changes of this sort? If yes, I don't think we should waste any more resources discussing it. But if not, I'd like to thoroughly investigate the possibilities and come up with something that could be worth doing.

rmorshea commented 2 years ago

Matching the way people write code

This is less about how people write code and more about how people read it. The need for a tool like Black tells use that the way people write code is actually inconsistent and messy. Users choose to cede control over formatting to Black because the readability of the resulting code is, in 99% of cases, as good or better than it was before. It seems evident that this problem case makes up a disproportionate part of that 1% of outputs which have subpar readability.

Is the proposed style better?

In my opinion, the resulting code in the given counter examples is not worse than it was before. If it turned out that this was representative of the worst case scenario, I don't think asking whether the proposed rules make improvements in all cases is really what we should focus on. Rather, we should evaluate whether the improvements which we already know about outweigh the costs of the disruption these changes would cause.

rmorshea commented 2 years ago

I think it's also worth noting that much of the discussion around this change happened before the new stability policy. The --future flag could be a great opportunity to introduce this change and collect feedback without any real disruption. If it turns out from the feedback that this does more harm than good, we can just revert it before the next annual release.

LLyaudet commented 2 years ago

I just add my voice in favor of this issue with the following example :

                mnopqrstuvwxyza = (
                    set(stuvwxyz_abcdefghi[pqrstuvwxy].keys())
                    - set(stuvwxyz_abcdefghij[pqrstuvwxy].keys())
                )

black currently suggests this modification that is sub-optimal for readability :

-                mnopqrstuvwxyza = (
-                    set(stuvwxyz_abcdefghi[pqrstuvwxy].keys())
-                    - set(stuvwxyz_abcdefghij[pqrstuvwxy].keys())
+                mnopqrstuvwxyza = set(stuvwxyz_abcdefghi[pqrstuvwxy].keys()) - set(
+                    stuvwxyz_abcdefghij[pqrstuvwxy].keys()
                 )

I did not check if the PR handles this case also, but I hope it is the case :)

cdce8p commented 2 years ago

Though I post a small update. Unfortunately, I don't have time to work on it anytime soon. If someone else is interested in picking this up, please feel free to do so!

felix-hilden commented 2 years ago

Another case from #587: I'd expect binary operators to be formatted similarly to boolean operators.

pradyunsg commented 2 years ago

A specific example, quoting from https://github.com/psf/black/issues/587#issuecomment-475581498:

I have a code snippet as follows:

        looks_like_pep = (
            file_path.startswith("pep-") and
            file_path.endswith((".txt", "rst")
        )

With line length 79, black re-formats this code to:

        looks_like_pep = file_path.startswith("pep-") and file_path.endswith(
            (".txt", "rst")
        )
ambv commented 2 years ago

This is intentional, there is no automated rule we can do that will behave better. If we always put organizing parentheses whenever there is only a single operator between two expressions, this would lead to things like

a = (
  bbbbbb(1234567890)
  + cccccc(123456790)
)

instead of

a = bbbbbb(123456790) + cccccc(
    123456790
)

This formatting also takes up fewer lines.

The rule now uses organizing parentheses when there's more than one operator involved. It's a very simple rule that is easy to explain. If we made deviations here, not only would we invite other edge cases, but we would also have trouble explaining what Black did in those suboptimal cases.

Therefore, I'm somewhat skeptical we can improve much here without making everything much more complicated.

felix-hilden commented 2 years ago

Fair points by Łukasz! (For the time being I'm just organising the issues. Still discussions to be had.)

A somewhat tricky case from #1094 though: when having both binary and boolean operators, the order of operations should determine which is used to split first:

if str(
    self.user.id
) != self.target_user_id and not CheckUserManagementViewPermission.user_has_permission(
    self.user
):
    pass

# becomes

if (
    str(self.user.id) != self.target_user_id
    and not CheckUserManagementViewPermission.user_has_permission(
        self.user
    )
):
    pass
xM8WVqaG commented 2 years ago

Personally, I think your newer example format is cleaner.

a = (
  bbbbbb(1234567890)
  + cccccc(123456790)
)

means you only have complete pieces of logic on each line rather than breaking inconsistently part way through functions.

It's easier to read complete sentences when they're together, and it creates more legible diffs:

--- a/example.py
+++ b/example.py
@@ -1,4 +1,4 @@
 a = (
   bbbbbb(1234567890)
-  + cccccc(123456790)
+  - dddddd(123456790)
 )

rather than

--- a/example.py
+++ b/example.py
@@ -1,3 +1,3 @@
-a = bbbbbb(123456790) + cccccc(
+a = bbbbbb(123456790) - dddddd(
     123456790
 )
ambv commented 2 years ago

@felix-hilden it's confusing to cram both issues here. Let's focus here on the single-operator parentheses.

@xM8WVqaG Black will prefer fewer vertical lines over more, if it can be helped. Unfortunately, changing this will affect a lot of code which now formats okay and will format worse. It's a relatively easy change for you to make to your Black installation:

  1. Install it with pip install --no-binary=:all: black.
  2. Run python3 -c 'import black.lines; print(black.lines.__file__)'.
  3. Open the file under the path returned by the last command.
  4. Edit the line that says:
    if bt.delimiter_count_with_priority(max_priority) > 1:

to say > 0 instead. Test with as much code as you can lay your hands on. You'll see some regressions in behavior.

One more thing to note here that you won't be aware of unless you actually contributed to Black. Black formats everything (with one exception that is irrelevant here) in incoming token order, which usually is "left to right". This produces regular and predictable formatting because it doesn't "look forward" to try to reformat things and then decide whether they are nice or not.

In turn, it won't know the difference between, say:

a = (
    bbbbbb(123456790)
    + cccccc(
        1234567890
        + 1234567890
   )
)

instead of the much more reasonable

a = bbbbbb(123456790) + cccccc(
    1234567890 + 1234567890
)
cdce8p commented 2 years ago

It's difficult since it would effect almost every project that uses black today. However, just from seeing how many related issues were created, it looks like many users would like an improvement (me included).

Just from the initial testing I've done with #2237, it might be worth limiting any changes to LOGIC_PRIORITY or higher for now. There are still some corner cases with that, but much fewer than when including all operators.

rmorshea commented 2 years ago

While many people do report this, it's hard to know where the majority opinion lies just by looking at the number of issues. The people who are satisfied with the current behavior aren't posting issues, and thus, are not participating in this conversation. I'd personally appreciate some change here, but it's important to remember that there are a lot of black users whose viewpoints are not being represented.

LLyaudet commented 2 years ago

The "hierarchical first" line-split seems always better to me. If not chosen as the sensible default, it would be nice to have it as a configuration option.

felix-hilden commented 2 years ago

I think whatever is decided here, we won't have a configuration option. We don't want to give any unnecessary control over formatting.

MarkusPiotrowski commented 2 years ago

@ambv I'm sorry, but for me the 'counter-examples' that you show look better/are easier to read than what black is suggesting. Maybe there are different ways how different people read code. But especially in your two examples:

a = bbbbbb(123456790) + cccccc(
    123456790
)

a = bbbbbb(123456790) + cccccc(
    1234567890 + 1234567890
)

the closing parentheses are visually completely dis-aligned to the code which they enclose, because they are vertically and horizontally separated from their opening parenthesis.

It is not only that, if you need line breaks, one would prefer to have both objects of an operation horizontally aligned and having complete pieces of logic on one line, but also about the quick identification of matching parentheses and their enclosed content.

lig commented 2 years ago

@felix-hilden here is a possible take on the example from tests

if prevp.parent and prevp.parent.type in {
    syms.typedargslist,
    syms.varargslist,
    syms.parameters,
    syms.arglist,
    syms.argument,
}:

# start from scratch

if prevp.parent and prevp.parent.type in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}:

# add parentheses

if (
    prevp.parent
    and prevp.parent.type in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}
):

# still long, add more

if (
    prevp.parent
    and (
        prevp.parent.type
        in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}
    )
):

# still long

if (
    prevp.parent
    and (
        prevp.parent.type
        in {
            syms.typedargslist,
            syms.varargslist,
            syms.parameters,
            syms.arglist,
            syms.argument,
        }
    )
):

# try to inline in reverse

if (
    prevp.parent
    and (
        prevp.parent.type in {
            syms.typedargslist,
            syms.varargslist,
            syms.parameters,
            syms.arglist,
            syms.argument,
        }
    )
):

@felix-hilden this is kinda similar to your last option. Frankly, it looks like it could be inlined once more and it will be exactly it. Anyway, putting and and in on the same level here isn't right IMO.

ktbarrett commented 2 years ago

@lig I think you could continue to inline twice more back to the original desired style from the test. I do not think it's unreadable.

A recursive algo which inlines leaves as it returns from the call stack as long as the max line length is not violated is a good way to go about it. The point is to break the outer-most element if the line is too long. It's what my intuition is, and what I would write naturally if I cared. In reality I just let everything go to the right and let black clean it up 😁. I have even written a pretty printer that used that exact algo because pprint looks awful.

Examples of different max-line-lengths of @ambv's example with the stated algo. ```python # max-line-length=1 aaaaaaaaaa = ( bbbbbb( 123456790 ) + cccccc( 1234567890 + 1234567890 ) ) # max-line-length=20 aaaaaaaaaa = ( bbbbbb(123456790) + cccccc( 1234567890 + 1234567890 ) ) # max-line-length=40 aaaaaaaaaa = ( bbbbbb(123456790) + cccccc(1234567890 + 1234567890) ) # max-line-length=60 aaaaaaaaaa = ( bbbbbb(123456790) + cccccc(1234567890 + 1234567890) ) # max-line-length=80 aaaaaaaaaa = bbbbbb(123456790) + cccccc(1234567890 + 1234567890) ```
gar1t commented 1 year ago

I would suggest (re)considering the use of a configuration option to balance the interests here. The magic trailing comma is a similar case - it's a syntactically benign way to signal formatting intent.

Some of the formatting examples above are so pathological (as are the cases in our code base - I omit them as the topic is well covered above) that I think it should be a priority to address them in one way or another.

In the case of boolean expressions, adding a trailing and True is similar to the trailing comma - and cleans up the formatting but at the obvious expense.

I don't follow the argument that fewer lines of code trump readability.

Disruption to existing code is obviously a big deal. But if the implementation is to use grouping parenthesis as a cue for the new formatting rules, these would have to be explicitly added and existing code should not be impacted. Again, similar to the trailing comma. And if there's a config option - also similar to trailing comma - there's no impact at all or users could opt out (depending on the default behavior).

ambv commented 1 year ago

I would suggest (re)considering the use of a configuration option to balance the interests here.

We avoid that here.

I don't follow the argument that fewer lines of code trump readability.

They usually do because they allow more content to fit one screen or page. Black additionally prefers to format things in a way where lines read from left to right instead of pre-emptively exploding into one-element-per-line.

The unclear examples in this open issue are obviously controversial, but a ton of others aren't, and nobody lists them here because they just look good. The current rule requiring "more than one delimiter" to use optional parentheses was added because without it, the output was clearly overusing parentheses leading to explosion of newlines in unexpected places.

The problem with a tool that applies the same rules every time is that in some cases a given rule leads to output that is less obvious. It seems here most pushback stems from using square brackets or parentheses from a call to split a line.

In the case of organizing optional parentheses, there is no easy-to-express rule to make it work for everybody. Most of all, it's to a large extent subjective. What to some is "so pathological that it should be a priority", is not a big deal (or is indeed preferred) by others.

Using existing parentheses as a marker that they should be kept makes the tool less automatic. It will then happily add parentheses in some cases but when you modify the code, it won't take them out. That's already a source of confusion for users when it comes to trailing commas, and adding parentheses to the mix would make this problem worse.

Originally Black didn't automatically put or remove parentheses, it would just respect what it found already in the file. This was widely criticized, I received many requests to change it. This issue argues, in essence, to roll this change back, which would be an unpopular decision.

lig commented 1 year ago

I would suggest (re)considering the use of a configuration option to balance the interests here.

We avoid that here.

Maybe these couple of cases are those good candidates for having options for them? AFAIR, skip-string-normalization was kinda similarly controversial and it took Guido's opinion to make this into an option. These two are obviously controversial, have different logic already, and confuse both sides.

Wyko commented 1 year ago

Just an extra little bump here. I stumbled on this issue because the logic trees in my application are getting worse and worse due to this formatting choice. If I had the option to cause black to format these more nicely, I'd do it in a heartbeat.

if self._rule_has_same_enviroment_tag(
    rule=rule, live_rule=live_rule
) and self._rule_has_matching_source(rule=rule, live_rule=live_rule):
    return live_rule

# vs

if (
    self._rule_has_same_enviroment_tag(rule=rule, live_rule=live_rule)
    and self._rule_has_matching_source(rule=rule, live_rule=live_rule)
):
    return live_rule

The first one is almost unreadable to me.

rijenkii commented 1 year ago

Very much needed. Current behavior is sometimes so unreadable that I have to wrap code in #fmt: on/off.

itcarroll commented 1 year ago

@JelleZijlstra invited a PR to address #3800: comments that get egregiously moved. The fix could potentially allow commenting to preserve optional parentheses. Not saying that's good or bad, just relevant to this discussion.

Putative "magic comment" in action:

a = (
    bbbbbb(123456790)  #
    + cccccc(123456790)
)
LucidOne commented 1 year ago

I'd like a real solution but at minimum I think this is a problem with the docs. Similar to https://github.com/psf/black/issues/2156#issuecomment-1505408040 adding an extra or None at least makes it readable.

_cfg_dir = _os.environ.get("RYO_USER_CONFIG_DIR") or _appdirs.user_config_dir(
    "ryo-iso", "HxR"
)
_cfg_dir = (
    _os.environ.get("RYO_USER_CONFIG_DIR")
    or _appdirs.user_config_dir("ryo-iso", "HxR")
    or None
)

Aside from --line-length 99, I'm still not sure how to improve this sort of thing.

(self.base_image["distro"], self.base_image["version"]) = self.iso_cfg[
    "image"
].split("/")
0dminnimda commented 1 year ago

That's interesting that adding the third operand actually make thye formatting nice, but with two it's unreadable

return (
    self.context.control_flow.is_conditional_jump(temp)
    or self.context.is_used_temp(temp)
    or None
)
return self.context.control_flow.is_conditional_jump(
    temp
) or self.context.is_used_temp(temp)

Can the behavior be changed to respect two operands as well? It doesn't seem like it will actually negatively affect anything.

AdamWill commented 7 months ago

edit: eh, I guess it is fairly similar to some of the other examples in the end. i'll just use fmt off. :/

MichaReiser commented 7 months ago

This is related to https://github.com/psf/black/issues/4123. However, I'm not sure if such a fundamental change is an option at this point.