psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
39.11k stars 2.47k forks source link

Crash on docstring with ideographic space #2199

Open turian opened 3 years ago

turian commented 3 years ago

Describe the bug

INTERNAL ERROR: Black produced code that is not equivalent to the source.

To Reproduce Steps to reproduce the behavior:

  1. Take this file: https://raw.githubusercontent.com/taesungp/contrastive-unpaired-translation/master/models/networks.py
  2. Run Black on it with these arguments: None
  3. See error

Expected behavior

No error should throw.

Environment (please complete the following information):

Does this bug also happen on master?

Using the online formatter at https://black.now.sh/?version=master there were no errors thrown.

Additional context Add any other context about the problem here.

Here is my log:

--- src
+++ dst
@@ -4134,11 +4134,11 @@
       body=
         Expr(
           value=
             Constant(
               value=
-                "Return a learning rate scheduler\n\nParameters:\noptimizer          -- the optimizer of the network\nopt (option class) -- stores all the experiment flags; needs to be a subclass of BaseOptions.\u3000\nopt.lr_policy is the name of learning rate policy: linear | step | plateau | cosine\n\nFor 'linear', we keep the same learning rate for the first <opt.n_epochs> epochs\nand linearly decay the rate to zero over the next <opt.n_epochs_decay> epochs.\nFor other schedulers (step, plateau, and cosine), we use the default PyTorch schedulers.\nSee https://pytorch.org/docs/stable/optim.html for more details.",  # str
+                "Return a learning rate scheduler\n\nParameters:\noptimizer          -- the optimizer of the network\nopt (option class) -- stores all the experiment flags; needs to be a subclass of BaseOptions.\nopt.lr_policy is the name of learning rate policy: linear | step | plateau | cosine\n\nFor 'linear', we keep the same learning rate for the first <opt.n_epochs> epochs\nand linearly decay the rate to zero over the next <opt.n_epochs_decay> epochs.\nFor other schedulers (step, plateau, and cosine), we use the default PyTorch schedulers.\nSee https://pytorch.org/docs/stable/optim.html for more details.",  # str
             )  # /Constant
         )  # /Expr
         If(
           body=
             FunctionDef(
cooperlees commented 3 years ago

Since you're on 20.8b1, can you please try 21.5b0 release. I believe this should be fixed.

ambv commented 3 years ago

The difference is this \u3000 aka \n{IDEOGRAPHIC SPACE} character that was somehow removed by Black:

Screenshot 2021-05-08 at 14 38 10
lionderful commented 3 years ago

Since you're on 20.8b1, can you please try 21.5b0 release. I believe this should be fixed.

no. same problem on 21.5b1. Black destroy my code.
version 21.5b1; win 10; here is my code: https://github.com/lionderful/Black_bug_report/blob/main/picker.py

pdc commented 3 years ago

I have had something which I think may be similar & which was fixed in 21.5b1

--- src
+++ dst
@@ -779,11 +779,11 @@
       body=
         Expr(
           value=
             Constant(
               value=
-                'Parse the raw data from the keyboard layout editor.\n\nArgument –\xa0\nkle –\xa0text in the ‘raw data’ format of Keyboard Layout Editor\n\nReturns –\xa0list of Key instances.',  # str
+                'Parse the raw data from the keyboard layout editor.\n\nArgument –\nkle –\xa0text in the ‘raw data’ format of Keyboard Layout Editor\n\nReturns –\xa0list of Key instances.',  # str
             )  # /Constant
         )  # /Expr
         Return(
           value=
             Call(

The similarity is that a whitespace character (in my case U+00A0 nonbreaking space) before a newline has been stripped. I notice that another \xA0 not at the end of a line has been retained. In my case this is a harmless change to a docstring and the workaround is to remove it from the source file.

I upgraded Black to v 21.5b1 and it now reformats the affected files, still stripping the nonbreaking space but no longer unhappy about it. That’s fine by be, since it is a docstring. I assume it would not happen to a string in real code.

JelleZijlstra commented 3 years ago

I assume it would not happen to a string in real code.

That's correct, we only normalize whitespace in docstrings, not other strings.

JelleZijlstra commented 3 years ago

I can't reproduce this crash on current main. Which is weird, because I don't think we fixed this since the last release.