psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
38.56k stars 2.43k forks source link

string_processing: Use explicit string concatenation #2553

Open chasefinch opened 2 years ago

chasefinch commented 2 years ago

Describe the style change

I suggest to use explicit string concatenation, instead of implicit concatenation, for (currently experimental) string processing, with whitespace pushed to the end of each line.

Examples in the current Black style

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor"
    " incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis"
    " nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod     "  # whitespace is split
    " incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis"
    " nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "https://www.some-really-really-really-really-really-really-really-really-really-really-really-really-long-url.com"
)

Desired style

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor "
    + "incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis "
    + "nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod      "  # whitespace is not split
    + "incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis"
    + "nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

(
    "https://www.some-really-really-really-really-really-really-really-really-"
    + "really-really-really-really-long-url.com"
)

Additional context

Advantages of this method:

Worth noting that Guido & the Python community have considered removing implicit concatenation for these reasons, but ran into complications (here's an article about this). However, the reasons why they left it in the language (% precedence, etc.) don't apply to whether Black uses the feature (for the % precedence issue, it doesn't apply at least because the multiline strings are wrapped in parentheses), so I think that Black ought to use the explicit version.

felix-hilden commented 2 years ago

Thanks for the suggestion, it's opposite to #2268, and if it was implemented it would make #3159 pretty redundant. I don't feel strongly, but to me this is more readable than no parens. I'm not sure if this or parens are better.

chasefinch commented 2 years ago

Yeah I still think this is the way to go.

yilei commented 2 years ago

I don't have a strong preference between this (explicit str concat) and parens (#3159). There are some cases I think parens have better readability, e.g.

some_list = [
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

Parens make it more clear that it's a two element list.

However, during the implementation of #3159, we realized adding parens (thus another level of indentation) can result in significant changes to the code, which could be off putting.

--- a/django:django/db/backends/mysql/features.py
+++ b/django:django/db/backends/mysql/features.py
@@ -122,12 +132,14 @@
             skips.update(
                 {
                     "https://jira.mariadb.org/browse/MDEV-19598": {
-                        "schema.tests.SchemaTests."
-                        "test_alter_not_unique_field_to_primary_key",
+                        (
+                            "schema.tests.SchemaTests."
+                            "test_alter_not_unique_field_to_primary_key"
+                        ),
                     },
                 }
             )

v.s.

--- a/django:django/db/backends/mysql/features.py
+++ b/django:django/db/backends/mysql/features.py
@@ -122,12 +132,14 @@
             skips.update(
                 {
                     "https://jira.mariadb.org/browse/MDEV-19598": {
                         "schema.tests.SchemaTests."
-                        "test_alter_not_unique_field_to_primary_key",
+                        + "test_alter_not_unique_field_to_primary_key",
                     },
                 }
             )

This is the reason in #3159 we decided to limit parens to only list/tuple/set literals, and exclude function calls. IMO we should still do something to function calls, maybe explicit str concatenation is the answer. If so, then we can choose explicit str concatenation over parens for list/tuple/set literals too.

chasefinch commented 2 years ago

@yilei I prefer keeping the parens ~rather than~ by implementing #3260 — Would actually prefer for Black to continue adding them; they are consistent, if not always pretty — but if dropping the parens for function calls is the way to get explicit concatenation everywhere else, then ok 😛

felix-hilden commented 2 years ago

It would feel a bit off to use parens in other places and explicit concatenation in others in my opinion.

yilei commented 2 years ago

Agreed that we should try to use the same formatting in function calls v.s. list/tuple/set literals.

I don't think there exists a clear call that either explicit-str-concat or use-parens would be better.

Then how about this approach: Black takes existing parens (in unformatted code) into account and decides which format to use?

  1. If existing code doesn't have parens, use explicit str concatenations:
# before
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

# after
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
  1. If existing code already have parens, keep the parens and use implicit str concatenations inside parens:
# before
function_call(
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

# after
function_call(
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

Then, we are able to use either of them as we see fit depending on the local context.

I understand that this will add another exception to Previous formatting is taken into account as little as possible, with rare exceptions like the magic trailing comma., but IMHO this is a good trade-off.

WDYT? @felix-hilden @JelleZijlstra

felix-hilden commented 2 years ago

I see your point, but we do really want to minimise the effect of the code itself to our formatting choices for consistency. If there isn't a clear winner but we want to format consistently, then I guess the decision has to be more or less arbitrary. Much like string prefixes or other minor details.

yilei commented 2 years ago

Hmm, focusing on existing parens here, this actually fits Black's paren management here:

Please note that Black does not add or remove any additional nested parentheses that you might want to have for clarity or further code organization. For example those parentheses are not going to be removed:

return not (this or that)
decision = (maybe.this() and values > 0) or (maybe.that() and values < 0)

These parens are inner parens that you might want to have for clarity or further code organization. Following this rule, the existing parens in these cases should not be removed, i.e. we should implement #3260.

With above implemented, we can then take a look at whether we want to use explicit str concatenations, or force adding parens + implicit str concatenations. If this is a more or less arbitrary decision, I think we can chose explicit str concatenations here as it produces less diffs than force adding parens.

The examples below illustrate above decisions:

# before
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
function_call(
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (" lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim"),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

# after
function_call(
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    + " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
function_call(
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        + " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
)
some_list = [
    (
        " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
        + " incididunt ut labore et dolore magna aliqua Ut enim ad minim"
    ),
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]
felix-hilden commented 2 years ago

Alright, turns out that in terms of a cohesive parens policy, we're not really there 😅 I'm not sure how each case was decided, but we definitely don't preserve all "organising" parens.

playground link

Perhaps we should consider the bigger picture first.

yilei commented 1 year ago

@felix-hilden hmm, I think we draw a different line between "organising" and "non-organising" parens.

The purpose of them is to (significantly) increase readability, so maybe we should really call them readability-parens.

  1. For the examples from the "current" blacks style:

    return not (this or that)
    decision = (maybe.this() and values > 0) or (maybe.that() and values < 0)

    It's not difficult to agree that they are readability-parens as you don't need to memorize the operator priority table.

  2. For the parens of concatenated strings across multiple lines, I'd argue they also have significant readability benefits. The "improved string processing" from --preview style doesn't agree right now, hence #3260 should be implemented instead.

  3. For the newly removed unnecessary parens in the --preview style (https://black.readthedocs.io/en/stable/the_black_code_style/future_style.html#improved-parentheses-management), I don't think they are readability-parens, so they still fit.

  4. Other unremoved parens from playground link, with the exception of (a) == (b), they don't belong to readability-parens. They aren't removed yet I think that's because they haven't been looked at (unlike the ones from --preview style's "Improved parentheses management", which are recently looked at).

chasefinch commented 1 year ago

Bump for this one.

I think the parens argument (now resolved) was not related, and muddied this thread.

I would still love to see explicit concatenation replace implicit concatenation in all cases, for the reasons mentioned at the top:

tylerlaprade commented 1 year ago

As an additional data point, --preview broke my code for both Pyright (reportImplicitConcatenation) and Ruff/Flake8 (ISC002) today, so I had to revert to non-preview mode (which I was otherwise really enjoying for the other enhancements! Breaking on the right side instead of the left is so much more aesthetic).

saolof commented 1 year ago

It may be worth mentioning that starlark (and google internal linters) explicitly bans implicit python string concatenation. Guido himself has mentioned that implicit string concat only exists because of backwards compatibility: https://mail.python.org/pipermail/python-ideas/2013-May/020557.html

My own first experience of black adding implicit concat somewhere was complete surprise and confusion at this being a language feature, so while I do not have a strong opinion about black adding + operators, I am very strongly against black introducing implicit string concat anywhere where it was not there before, since that also sort of blocks most linters from having a lint rule against it that is enabled by default.