ryantam626 / jupyterlab_code_formatter

A JupyterLab plugin to facilitate invocation of code formatters.
MIT License
822 stars 52 forks source link

Bug: formatting deeply indented `if` is not idempotent and breaks code #288

Open black-puppydog opened 1 year ago

black-puppydog commented 1 year ago

Checklist prior to opening an issue

First off, despite what I am writing below, I want to thank Ryan and all the other contributors for making this! The extension has saved me a ton of trouble and nerves and has just been an all-round quality of life improvement! 😊

Describe the bug

I have a cell in one of my notebooks that breaks on formatting. Not sure what is going wrong yet, but I managed to get a minimal breaking example:

lm_tokenizer = object()
def f():
    while True:
        while True:
            while True:
                if (
                    lm_tokenizer.convert_tokens_to_ids(name) != lm_tokenizer.unk_token_id
                ):
                    pass

The key here is the if statement in combination with the deep nesting. In real life, this is of course a combination of class, function, and if/else nesting.

The bug happens upon formatting twice. Here's what the first saving produces, which matches what the black online formatter produces:

lm_tokenizer = object()

def f():
    while True:
        while True:
            while True:
                if (
                    lm_tokenizer.convert_tokens_to_ids(name)
                    != lm_tokenizer.unk_token_id
                ):
                    pass

Note that the second part of the if statement is now the start of the line, beginning with a ! character. Now, if I paste this version into the black online formatter, it correctly produces the exact same output. However, both on my local JupyterLab and on saturncloud.io, I get this output on the second pass:

lm_tokenizer = object()

def f():
    while True:
        while True:
            while True:
                if (
                    lm_tokenizer.convert_tokens_to_ids(name)
#                      != lm_tokenizer.unk_token_id
                ):
                    pass

The second part of the if has been commented out, and I don't know where that extra unicode symbol comes from either. In Jupyter it shows up as a red dot. In any case, this obviously changes the code significantly, so something is going wrong... Since black itself seems to be doing just fine, I'd expect this to be a downstream issue, hence I'm filing this here.

Workaround

For the time being, I just disabled formatting for the code inside the if with #fmt:off. So I'm not blocked by this, but it took me a while to spot this since my code didn't break, it "just" produced incorrect results.

Diagnostic commands

Most relevant here, both black and jupyterlab_code_formatter are up to date:

black==22.10.0
jupyterlab-code-formatter==1.5.3
Full diagnostics ```terminal $ pip freeze absl-py==1.2.0 aiohttp==3.8.3 aioitertools==0.11.0 aiosignal==1.2.0 anyio==3.6.1 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asttokens==2.0.8 astunparse==1.6.3 async-timeout==4.0.2 attrs==22.1.0 Babel==2.10.3 backcall==0.2.0 beautifulsoup4==4.11.1 bencoder.pyx==3.0.0 black==22.10.0 bleach==5.0.1 boto3==1.24.87 botocore==1.27.59 cachetools==5.2.0 certifi @ file:///croot/certifi_1665076670883/work/certifi cffi==1.15.1 charset-normalizer==2.1.1 click==8.1.3 commonmark==0.9.1 ConfigArgParse==1.5.3 contourpy==1.0.5 croniter==1.3.7 cycler==0.11.0 datasets==2.6.0 debugpy==1.6.3 decorator==5.1.1 deepdiff==5.8.1 defusedxml==0.7.1 dill==0.3.5.1 dnspython==2.2.1 dottorrent==1.10.1 dottorrent-gui==1.3.11 email-validator==1.3.0 entrypoints==0.4 execnb==0.1.4 executing==1.1.0 fastcore==1.5.27 fastjsonschema==2.16.2 filelock==3.8.0 fire==0.4.0 fonttools==4.37.4 frozenlist==1.3.1 fsspec==2022.8.2 ghapi==1.0.3 google-auth==2.12.0 google-auth-oauthlib==0.4.6 grpcio==1.49.1 h11==0.14.0 httptools==0.5.0 huggingface-hub==0.10.0 humanfriendly==10.0 idna==3.4 ipykernel==6.16.0 ipython==8.5.0 ipython-genutils==0.2.0 ipywidgets==8.0.2 isort==5.10.1 itsdangerous==2.1.2 jedi==0.18.1 Jinja2==3.1.2 jmespath==1.0.1 joblib==1.2.0 json5==0.9.10 jsonschema==4.16.0 jupyter==1.0.0 jupyter-console==6.4.4 jupyter-core==4.11.1 jupyter-server==1.19.1 jupyter_client==7.3.5 jupyterlab==3.4.8 jupyterlab-code-formatter==1.5.3 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.3 jupyterlab_server==2.15.2 kiwisolver==1.4.4 lxml==4.9.1 Markdown==3.4.1 MarkupSafe==2.1.1 matplotlib==3.6.0 matplotlib-inline==0.1.6 mistune==2.0.4 multidict==6.0.2 multiprocess==0.70.13 mypy-extensions==0.4.3 nbclassic==0.4.4 nbclient==0.7.0 nbconvert==7.1.0 nbdev==2.3.7 nbformat==5.6.1 nest-asyncio==1.5.6 nltk==3.7 notebook==6.4.12 notebook-shim==0.1.0 numpy==1.23.3 oauthlib==3.2.1 ordered-set==4.1.0 orjson==3.8.0 packaging==21.3 pandas==1.5.0 pandocfilters==1.5.0 parso==0.8.3 pathspec==0.10.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.2.0 platformdirs==2.5.2 prometheus-client==0.14.1 prompt-toolkit==3.0.31 protobuf==3.19.6 psutil==5.9.2 ptyprocess==0.7.0 pudb==2022.1.2 pure-eval==0.2.2 pyarrow==9.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.21 pydantic==1.10.2 pyDeprecate==0.3.2 Pygments==2.13.0 PyJWT==2.5.0 pyparsing==3.0.9 pyrsistent==0.18.1 python-dateutil==2.8.2 python-dotenv==0.21.0 python-multipart==0.0.5 pytorch-lightning==1.7.7 pytorch-pretrained-biggan==0.1.1 pytz==2022.4 PyYAML==6.0 pyzmq==24.0.1 qtconsole==5.3.2 QtPy==2.2.1 regex==2022.9.13 requests==2.28.1 requests-oauthlib==1.3.1 responses==0.18.0 rich==12.6.0 rofimoji==5.6.0 rsa==4.9 s3transfer==0.6.0 scikit-learn==1.1.2 scipy==1.9.1 Send2Trash==1.8.0 six==1.16.0 sniffio==1.3.0 soupsieve==2.3.2.post1 speedtest-cli==2.1.3 stack-data==0.5.1 starlette==0.20.4 tensorboard==2.10.1 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 termcolor==2.0.1 terminado==0.16.0 threadpoolctl==3.1.0 tinycss2==1.1.1 tokenizers==0.12.1 tomli==2.0.1 torch==1.12.1 torchmetrics==0.10.0 torchvision==0.13.1+cpu tornado==6.2 tqdm==4.64.1 traitlets==5.4.0 transformers==4.22.2 typing_extensions==4.3.0 ujson==5.5.0 urllib3==1.26.12 urwid==2.1.2 urwid-readline==0.13 uvicorn==0.18.3 uvloop==0.17.0 watchdog==2.1.9 watchfiles==0.17.0 wcwidth==0.2.5 webencodings==0.5.1 websocket-client==1.4.1 websockets==10.3 Werkzeug==2.2.2 widgetsnbextension==4.0.3 wrapt==1.14.1 xxhash==3.0.0 yarl==1.8.1 youtube-dl==2021.12.17 yt-dlp==2022.8.19 ``` ```terminal $ jupyter labextension list JupyterLab v3.4.8 /home/daan/miniconda3/envs/hfc/share/jupyter/labextensions jupyterlab_pygments v0.2.2 enabled OK (python, jupyterlab_pygments) @jupyter-widgets/jupyterlab-manager v5.0.3 enabled OK (python, jupyterlab_widgets) @ryantam626/jupyterlab_code_formatter v1.5.3 enabled OK (python, jupyterlab-code-formatter) ``` ```terminal $ jupyter serverextension list config dir: /home/daan/miniconda3/envs/hfc/etc/jupyter jupyterlab enabled - Validating... jupyterlab 3.4.8 OK jupyterlab_code_formatter enabled - Validating... jupyterlab_code_formatter 1.5.3 OK ```
ryantam626 commented 1 year ago

Sorry for the late reply, I haven't given this project much love, I just kinda lost steam over the years as I got increasingly burnt out, but lately I have gotten a second wind (perhaps only for a brief period...)

Oh boy this is not great.

FWIW I am in middle to a big refactor for the project (mostly due to the evolution of jupyterlab's plugin tooling), I can look into this bug after that.

Without the the refactor, the development envrionment for this plugin is just nightmare-ish to use, so that is currently trumping all tasks.