Enable feature `string_processing` to be default

cooperlees commented 3 years ago

Lets move CI + then the default of black to use the --experimental-string-processing behavior.

Today, if enabled on all black-primer projects, sqlalchemy fails AST check for parenthesis: https://pastebin.com/iaDH1Yg8

Lets enable what we can CI wise and get this stable + default.

cooperlees commented 3 years ago

Ok, so we're running this option on most primer projects now. Left to fix:

django
pandas (might be just #2193)
sqlalchemy (seems like a bug)

Please let us know if your repo has any issues with --experimental-string-processing to help us squash bugs before making it the default.

TylerYep commented 3 years ago

What is this feature supposed to do exactly? I tried it and it transformed the following string, which seems unideal to me, since it breaks the max line length of 88 by a large margin.

I'm using black version 21.5b0.

# input code:
assert str(suffix_arr) == (
    "['$', 'angaroo$', 'angrykangaroo$', 'aroo$', 'garoo$', "
    "'grykangaroo$', 'kangaroo$', 'ngaroo$', 'ngrykangaroo$', "
    "'o$', 'oo$', 'roo$', 'rykangaroo$', 'ykangaroo$']"
)

# black's output code:
assert (
    str(suffix_arr)
    == "['$', 'angaroo$', 'angrykangaroo$', 'aroo$', 'garoo$', 'grykangaroo$', 'kangaroo$', 'ngaroo$', 'ngrykangaroo$', 'o$', 'oo$', 'roo$', 'rykangaroo$', 'ykangaroo$']"
)

JelleZijlstra commented 3 years ago

Wow, that's pretty unfortunate. It's meant to split strings that are too long, not to make new oversized strings. We should track this as a bug and fix it before turning the feature on by default.

JelleZijlstra commented 3 years ago

The SQLAlchemy bug alluded to above is #2271.

bbugyi200 commented 3 years ago

@TylerYep I opened a new issue for this bug (https://github.com/psf/black/issues/2284). I'm working on a fix now.

EDIT: This should be fixed with the following PR: https://github.com/psf/black/pull/2286

tsx commented 2 years ago

Hi :wave:

I've come across this issue while considering adding a linting rule to my codebase to disallow implicit string concatenation for reasons listed in PEP3126, namely that code can look confusing when string concatenation is next to a comma-delimited list (of arguments or list items). It appears that the new string processing is making that confusion worse.

For example, here's before, 2 arguments clearly visible:

                    flash(
                        'None of the email addresses or domains you entered are valid',
                        'error',
                    )

and here's after, looks too close to a three-argument call if you're not paying very close attention to a missing comma.

                    flash(
                        'None of the email addresses or domains you entered'
                        ' are valid',
                        'error',
                    )

I'd like to propose that in cases like these, black would wrap the split string in parenthesis. Here's a desired outcome that would be a lot clearer:

                    flash(
                        (
                            'None of the email addresses or domains you entered'
                            ' are valid'
                        ),
                        'error',
                    )

On top of that, I encountered what appears to be a bug. Here's a simple reproduction case:

black, 21.12b0 (compiled: no)

x = (
    "xxx xxxxxxx xxx "
    f'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx{a["key"]}&xxxxxxxxxxxxxxxxx'
)

error: cannot format example.py: INTERNAL ERROR: Black produced invalid code on pass 1: f-string expression part cannot include a backslash (, line 4). Please report a bug on https://github.com/psf/black/issues. This invalid output might be helpful: /tmp/blk_iw9ps776.log

Contents of error log:

  File "/home/tsx/projects/close/venv/lib/python3.9/site-packages/black/__init__.py", line 1300, in assert_equivalent
    dst_ast = parse_ast(dst)
  File "/home/tsx/projects/close/venv/lib/python3.9/site-packages/black/parsing.py", line 187, in parse_ast
    raise SyntaxError(first_error)
x = (
    "xxx xxxxxxx xxx "
    f"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx{a[\"key\"]}&xxxxxxxxxxxxxxxxx"
)

Note that I'm running with skip-string-normalization = true in config file if it matters.

haridsv commented 2 years ago

How about indenting continued line, like this:

                    flash(
                        'None of the email addresses or domains you entered'
                            ' are valid',
                        'error',
                    )

felix-hilden commented 2 years ago

Just to link the issues: we're moving ESP under the new --preview flag (#2756). So while the functionality doesn't change and issues still need to be resolved, that's where experimental string processing can be found going forward! I'll leave this issue open though.

dvarrazzo commented 2 years ago

Hello,

I am trying the --preview flag on psycopg codebase and the results are pretty ok. I wonder if there is a way to leave certain strings split, because it improves readability. My example is:

 _binary_signature = (
     # Signature, flags, extra length
-    b"PGCOPY\n\xff\r\n\0"
-    b"\x00\x00\x00\x00"
-    b"\x00\x00\x00\x00"
+    b"PGCOPY\n\xff\r\n\0\x00\x00\x00\x00\x00\x00\x00\x00"
 )

the original string was split in three parts because they are three different elements of the headers, as indicated in the comment. Is there a way to tell black to leave the string split? For the moment I would use:

_binary_signature = (
    b"PGCOPY\n\xff\r\n\0"  # Signature
    b"\x00\x00\x00\x00"  # Flags
    b"\x00\x00\x00\x00"  # Extra length
)

Are there other mechanisms? thank you!

felix-hilden commented 2 years ago

Hi! Thanks for having a go. You could disable formatting with # fmt: off / # fmt: on or add the strings together, which we don't join (at least yet: #2268).

bbugyi200 commented 2 years ago

@dvarrazzo Keep in mind that this workaround is only necessary because the full string can be merged and placed on a single line without exceeding the configured line length limit. Black will respect manual string splits by default as long as both of the following assumptions hold:

None of the substrings exceed the line length limit (nor will they once black applies other format changes).
The full string (i.e. all substrings merged) cannot fit on a single line without exceeding the line length limit.

If one of the above does not hold, then your workaround is probably the way to go.

dvarrazzo commented 2 years ago

Thank you @bbugyi200, @felix-hilden. I think in this case interspersing comments is the way to go; in other cases (where there aren't meaningful comments to add) seems good that there is the fmt: off possibility.

MattF-NSIDC commented 2 years ago

@bbugyi200 What you describe doesn't quite match my understanding of the documentation in the "Future style" section of the docs, so I'm struggling to understand the intended behavior of the improved string processing:

Black will split long string literals and merge short ones. Parentheses are used where appropriate. When split, parts of f-strings that don’t need formatting are converted to plain strings. User-made splits are respected when they do not exceed the line length limit. Line continuation backslashes are converted into parenthesized strings. Unnecessary parentheses are stripped. The stability and status of this feature is tracked in this issue.

This does not specify that user-made splits would only be respected if they can't be combined into a single line shorter than configured line length, it just says that user-made splits will be respected. The latter is exactly my personal preference. If that's not the intended behavior, maybe the better wording for the docs would be "User-made splits are respected only if they would exceed the line length limit when combined"?

I like the behavior as documented because, similar to the magic trailing comma, I can implicitly tell Black when I have split the string in a significant way without fully disabling Black for a portion of my codebase.

Current default behavior:

-            "/api/path/to/endpoint/foo?"
-            "param1=bar&param2=baz"
+            "/api/path/to/endpoint/foo?" "param1=bar&param2=baz"

Current "preview" / "future style" behavior (expected no change):

-            "/api/path/to/endpoint/foo?"
-            "param1=bar&param2=baz"
+            "/api/path/to/endpoint/foo?param1=bar&param2=baz"

bbugyi200 commented 2 years ago

@MattF-NSIDC See my response to dvarrazzo above. I agree that the documentation in the "Future Style" section should probably be updated to include this additional user-split constaint. Namely: "The full string (i.e. all substrings merged) cannot fit on a single line without exceeding the line length limit."

ajoino commented 2 years ago

Hi, new to using Black and have an observation. When I tried the --preview feature I found it unaesthetic that splitting f-strings doesn't keep the f in f" if one of the splits is a constant string and the surrounding splits are f-strings.

f"Very long line with a {thing} that will be split by Black. Here is some more format stuff {stuff}."

is formatted as

f"Very long line with a {thing} "
"that will be split by Black. " #<-- bad vibes
f"Here is some more format stuff {stuff}."

(I know Black wouldn't format that string exactly like that but you get the point.) where I'd like

f"Very long line with a {thing} "
f"that will be split by Black. "
f"Here is some more format stuff {stuff}."

For consistency.

The mismatch between the first two columns is the same to me as if you would use inconsistent indents in parentheses

def foo(
    a,
     b, #<-- bad vibes
    c,
):
    pass

I'm not suggesting any changes in --experimental-string-processing or the planned changes, I'm just curious to hear if you've discussed this before and what the conclusions were.

bbugyi200 commented 2 years ago

@ajoino See https://github.com/psf/black/pull/1132#issuecomment-633160565 and surrounding comments for a previous discussion on this. As I said then, I don't have a strong preference here, but my vote would be for the current behavior based on the argument I gave in the linked to PR comment.

ajoino commented 2 years ago

@bbugyi200 Thank you for the link and your impressive work. I agree that this is a minor issue.

Just to further explain my "gripe", it's not that f isn't appended, it's that the quotation marks are not aligned. This

#|
#v
f"ab{c}"
 "def" #<-- pleasing alignment
f"g{h}i"
#^
#|

would be preferable to me, but there is nothing wrong with the current behaviour. Consistency is, like most things, in the eye of the beholder.

bbugyi200 commented 2 years ago

@ajoino Ahh, I see. I agree with you on this point: Aligning the quotation marks seems desirable and shouldn't have any negative consequences that I can think of.

I don't think that we should make this change before enabling the improved string handling on stable. With that said, if you create an issue for this, I doubt there will be much pushback. Furthermore, if you either tag me on the issue or comment back with the issue link, I'll save the email notification and may get to it in the future when I have more time.

yilei commented 2 years ago

Hi maintainers 👋🏼

What's your current opinion on the implicit-str-concat issue raised by @tsx earlier in this comment?

I agree with @tsx's proposal and reasoning. Implicit string concatenations are common source of bugs in places like a list, they are hard to spot, e.g.:

some_list = [
    ' lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor incididunt ut labore et dolore magna aliqua Ut enim ad minim',
    ' veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo',
]

black --preview currently formats to:

some_list = [
    " lorem ipsum dolor sit amet consectetur adipiscing elit sed do eiusmod tempor"
    " incididunt ut labore et dolore magna aliqua Ut enim ad minim",
    " veniam quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo",
]

Do you agree? I'd like to contribute the implementation if so.

Also a meta question: should we open a separate issue to track this?

Thanks, Yilei

JelleZijlstra commented 2 years ago

I agree with adding parentheses in this context. Also, yes it would be good to start a new issue.

reagle commented 2 years ago

Is this working? It moves the closing triple quote to a new line, but doesn't break the string which is what I thought this would do.

 ❯ black --version
 black, 22.6.0 (compiled: yes)
 ❯ cat ~/.config/black
 [tool.black]
 line-length = 88
 ❯ black --preview --diff reddit-watch.py
 --- reddit-watch.py    2022-07-19 17:46:14.470574 +0000
 +++ reddit-watch.py    2022-07-19 17:46:46.179463 +0000
 @@ -219,11 +219,12 @@
      updated_df.to_csv(updated_fn, index=True, encoding="utf-8-sig", na_rep="NA")
      return updated_fn

  def rotate_archive_fns(updated_fn: str) -> None:
 -    """Given an updated filename, archive it to the zip file and rename it to be the latest. this is the time for all good men"""
 +    """Given an updated filename, archive it to the zip file and rename it to be the latest. this is the time for all good men
 +    """
      print(f"Rotating and archiving {updated_fn=}")
      if not os.path.exists(updated_fn):
          raise RuntimeError(f"{os.path.exists(updated_fn)}")
      # print(f"{updated_fn=}")
      head, tail = os.path.split(updated_fn)
 would reformat reddit-watch.py

 All done! ✨ 🍰 ✨
 1 file would be reformatted.

yilei commented 2 years ago

@reagle this is the current expected behavior. there is some limited handling of docstrings (see https://github.com/psf/black/issues/144 and referenced issues), but it won't split. for other non-docstring multi-line strings, it won't split at all since it breaks the code.

reagle commented 2 years ago

@yilei, thanks for the response. The docs say "Black will split long string literals and merge short ones" and I think docstrings are string literals. Perhaps it could be updated to say "it'll split assigned string literals (which excludes docstrings)" or whatever is true.

BobDotCom commented 1 year ago

When I enabled this flag, I noticed that in docstring summaries there is an inconsistent newline before the closing quotes. When the summary is longer than the max line length it wraps the closing quotes onto a new line. Because closing quotes aren't normally placed on a newline with this case being the only exception, this makes enforcing docstring formatters such as pydocstringformatter much more difficult. Should this functionality be changed, or even a config flag added to modify it? Currently this makes the new functionality unusable for me, but adding the ability to force enable/disable the newline would resolve this.

Before experimental string processing:

# Before
def foo():
    """This is an example test docstring that is too long to fit within the line-length limit."""
# After
def foo():
    """This is an example test docstring that is too long to fit within the line-length limit."""

With experimental string processing:

# Before
def foo():
    """This is an example test docstring that is too long to fit within the line-length limit."""
def bar():
    """This is an example docstring that fits within the limit."""
# After
def foo():
    """This is an example test docstring that is too long to fit within the line-length limit.
    """
def bar():
    """This is an example docstring that fits within the limit."""

Jackenmen commented 1 year ago

Aside from bugs, to me, the biggest issue with string processing is probably the lack of any special treatment of \n when splitting lines (#1467).

Other than that, it seems that people sometimes also don't like that Black doesn't respect user-made splits if it can merge them into a single line though that's currently the case with the non-preview style as well. I do agree that at least some of the shown examples look better when they're split across multiple lines despite fitting in one but I don't think it's universally the case and I would say that I generally quite like that Black tries to merge strings into one line when they fit. Personally, I want Black to notice these kinds of opportunities for joining lines when I don't, even if it means that I'll have to fight with Black on occasion to keep lines separate.

ikding commented 1 year ago

Hello,

I am trying the --preview flag on my code base and noticed that the preview flag seems to concatenate user-defined long-string splits in function arg defaults, even if it makes the line too long.

Example:

def open_file_from_long_file_path(
    file_path: str = (
        "some/really/long/file/path/because/it/has/filehash/as/part/of/path/"
        "460e96a3b663f41a8412527424278bc60eb208ec5be1f5943e5dff2fe428a2da/"
        "file.txt"
    ),
) -> list[str]:
    with open(file_path) as f:
        return [line.rstrip() for line in f]

Running black without --preview flag, it doesn't make any changes to the file. But running it with --preview and I get this:

def open_file_from_long_file_path(
    file_path: str = "some/really/long/file/path/because/it/has/filehash/as/part/of/path/460e96a3b663f41a8412527424278bc60eb208ec5be1f5943e5dff2fe428a2da/file.txt",
) -> list[str]:
    with open(file_path) as f:
        return [line.rstrip() for line in f]

FYI I am running black 22.10.0.

felix-hilden commented 1 year ago

@ikding thanks for letting us know! I've submitted a separate issue for that above 👌

pauloneves commented 1 year ago

It won't split a log line if it enclosed in triple quotes:

# this will be splited:
x = "aaaaaaaaaa bbbbbbbb cccccccccccc dddddddddddddd eeeeeeeeeeee fffffffffffffffff ggggggggggggggg hhhhhhhhhh"

# this will NOT be splited:
x = """aaaaaaaaaa bbbbbbbb cccccccccccc dddddddddddddd eeeeeeeeeeee fffffffffffffffff ggggggggggggggg hhhhhhhhhh"""

Is this the desired behavior?

I tested with version 22.12.0

BobDotCom commented 1 year ago

When I enabled this flag, I noticed that in docstring summaries there is an inconsistent newline before the closing quotes. When the summary is longer than the max line length it wraps the closing quotes onto a new line. Because closing quotes aren't normally placed on a newline with this case being the only exception, this makes enforcing docstring formatters such as pydocstringformatter much more difficult. Should this functionality be changed, or even a config flag added to modify it? Currently this makes the new functionality unusable for me, but adding the ability to force enable/disable the newline would resolve this.

Seems to be fixed in #3430

XuehaiPan commented 1 year ago

Follow the comment at https://github.com/psf/black/issues/2188#issuecomment-1061634247, the new behavior also removes the f leading character for implicitly concatenated strings.

Expected (expect black --preview leave this unchanged):

actual = "aaa"
expected = "bbb"
message = (
    f"Very very very very very very very very very very very long message. Expected\n"
    f"    {expected!r}\n"
    f"but got\n"
    f"    {actual!r}\n"
    f"    ^^^"
)

I intentionally keep the f leading character to align the multiline string. Although some of the lines do not have an expression to substitute.

black --preview format this to:

actual = "aaa"
expected = "bbb"
message = (
    "Very very very very very very very very very very very long message. Expected\n"
    f"    {expected!r}\n"
    "but got\n"
    f"    {actual!r}\n"
    "    ^^^"
)

black removes the f leading character if there is no expression to substitute. This makes the code harder to maintain.

felixxm commented 1 year ago

Hi, I tried this changes on the Django repository and it's pretty destructive :upside_down_face: as it adds parentheses to all multi-line strings, e.g.

diff --git a/tests/template_tests/filter_tests/test_urlize.py b/tests/template_tests/filter_tests/test_urlize.py
index abc227ba6a..d5dbe5925f 100644
--- a/tests/template_tests/filter_tests/test_urlize.py
+++ b/tests/template_tests/filter_tests/test_urlize.py
@@ -24,10 +24,12 @@ class UrlizeTests(SimpleTestCase):
         )
         self.assertEqual(
             output,
-            '<a href="http://example.com/?x=&amp;y=" rel="nofollow">'
-            "http://example.com/?x=&y=</a> "
-            '<a href="http://example.com?x=&amp;y=%3C2%3E" rel="nofollow">'
-            "http://example.com?x=&amp;y=&lt;2&gt;</a>",
+            (
+                '<a href="http://example.com/?x=&amp;y=" rel="nofollow">'
+                "http://example.com/?x=&y=</a> "
+                '<a href="http://example.com?x=&amp;y=%3C2%3E" rel="nofollow">'
+                "http://example.com?x=&amp;y=&lt;2&gt;</a>"
+            ),
         )

     @setup({"urlize02": "{{ a|urlize }} {{ b|urlize }}"})
@@ -41,10 +43,12 @@ class UrlizeTests(SimpleTestCase):
         )
         self.assertEqual(
             output,
-            '<a href="http://example.com/?x=&amp;y=" rel="nofollow">'
-            "http://example.com/?x=&amp;y=</a> "
-            '<a href="http://example.com?x=&amp;y=" rel="nofollow">'
-            "http://example.com?x=&amp;y=</a>",
+            (
+                '<a href="http://example.com/?x=&amp;y=" rel="nofollow">'
+                "http://example.com/?x=&amp;y=</a> "
+                '<a href="http://example.com?x=&amp;y=" rel="nofollow">'
+                "http://example.com?x=&amp;y=</a>"
+            ),
         )

yilei commented 1 year ago

@felixxm For background, wrapping multi-line strings in parenthesis was first introduced in https://github.com/psf/black/pull/3162 for list/set/tuple literals, then extended to all including function calls in https://github.com/psf/black/issues/3292

The list/set/tuple literal cases are important for increased readability / bug preventions.

The reason this was also extended to function calls is that this makes the behavior more consistent and easy to explain. For function calls, it's a much weaker point for bug preventions since accidentally forgetting/adding a comma in the middle of implicitly concatenated strings would often lead to a runtime error (or type checking error) since it usually makes it mismatch the function signature.

One current workaround for not using parenthesis is to use explicit string concatenations, as demonstrated in this playground. However, this does require a lot of adjustments to the existing code to adopt the upcoming new stable style.

I do recognize this is quite a disruptive change especially because they are quite common for logging.XXX / warnings.warn / test code. So I'm personally open to reverting https://github.com/psf/black/issues/3292 but still keeping https://github.com/psf/black/pull/3162 for literals.

crhf commented 7 months ago

How do I use this feature? I have this file long.py with a single line:

s = "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"

When doing python3 -m black -l 80 --unstable long.py, the file is left unchanged. I would expect the string to be split.

detly commented 3 months ago

The reason this was also extended to function calls is that this makes the behavior more consistent and easy to explain. For function calls, it's a much weaker point for bug preventions since accidentally forgetting/adding a comma in the middle of implicitly concatenated strings would often lead to a runtime error (or type checking error) since it usually makes it mismatch the function signature.

Just a minor point on this: the typeCheckingMode = "standard" setting in basedpyright (and probably pyright but I haven't checked) turns on reportImplicitStringConcatenation. This is okay with:

raise SystemExit((
    "Thing already exists at the place."
    "Did you forget to update the other "
    "thing?"
))

...but lints this:

raise SystemExit(
    "Thing already exists at the place."
    "Did you forget to update the other "
    "thing?"
)

I have no opinion on which tool should do what, if anything, I'm just flagging that it might cause some friction with users if Black's policy is in competition with a popular linting tool.

detly commented 3 months ago

How do I use this feature? I have this file long.py with a single line:
s = "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789"
When doing python3 -m black -l 80 --unstable long.py, the file is left unchanged. I would expect the string to be split.

Do you not also need the --enable-unstable-feature string_processing flag?

JelleZijlstra commented 3 months ago

We don't split strings that don't have whitespace in them.

detly commented 3 months ago

Just a minor point on this: the typeCheckingMode = "standard" setting in basedpyright (and probably pyright but I haven't checked) turns on reportImplicitStringConcatenation.

I might have been wrong about this, the table lists it as off for the standard level. It might be a misconfiguration in my project. Sorry for the noise.

m-aciek commented 2 weeks ago

Do you not also need the --enable-unstable-feature string_processing flag?

@cooperlees Could you update the issue title to "Enable feature string_processing to be default" or similar to prevent confusion with recent black versions?

psf / black

Enable feature `string_processing` to be default #2188