mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.65k stars 386 forks source link

Black interferes with py:percent format in indented code blocks. #562

Closed Skylion007 closed 4 years ago

Skylion007 commented 4 years ago

I am having an issue where py:percent notebooks that have cells divided within an if statement get destroyed by black when using --pipe black

# %%
if __name__ == '__main__':
     print(1)
# %%
     print(2)

gets reformatted into the following, which is not equivalent

# %%
if __name__ == '__main__':
     print(1)
     # %%
     print(2)

For example, when running the following.

jupytext --to notebook test.py jupytext --to py:percent --pipe black test.ipynb

Will cause all cells that contain only indented code to be merged into one giant cell. Are they any formats / workarounds to ensure black does not destroy the scripts like so? Or are there any relevant settings that jupytext can set that will resolve this issue?

mwouts commented 4 years ago

Hi @Skylion007 , that is an interesting report! Do we agree that already, running black on the first input gives you the second?

I see that issues about comment indentation have been reported (and fixed) at https://github.com/psf/black/issues/16 and https://github.com/psf/black/issues/262, but I think here we are in a different case. For instance the cell-marker comment # %% between print(1) and print(2) does not match neither the previous nor the next code paragraph...

Also, I have another question for you: with that code in the Python script, what do you expect to get in Jupyter? AFAIK you cannot break down the if instruction into multiple code cells and still get it runnable, right?

Skylion007 commented 4 years ago

@mwouts You actually can break down an if statement instruction into multiple code cells and still have it be runnable! Try it, it works in Google Collab at the very least. It's also runnable directly in VSCode.

I've edited the minimal example to more closely mirror my actual code. In the updated code, the first cell should print "1" and the 2nd cell should print "2".

Here is a minimal collab @mwouts

Skylion007 commented 4 years ago

@mwouts I have a workaround for it now, but it's not very elegant.

Also, another issue to raise is that's impossible to escape spaces within a command in --pipe which created a lot of trouble writing the proper sed expression and having it run:

jupytext   --pipe-fmt py:percent --pipe black --pipe "sed s/[[:space:]]*\#[[:space:]]\%\%/\#\%\%/g" 
mwouts commented 4 years ago

Hi @Skylion007 , thanks for the details, I didn't know about these partial cells, that's interesting!

Do you think that allowing indented # %% markers could help? We could try to indent the cell markers to match the indentation of the first non-empty line in the code cell... Do you think it would solve this issue?

Skylion007 commented 4 years ago

It would help absolutely, but I am not sure if it's part of the 'standard' per say so it should probably be behind a flag/option maybe?

After all, the py:percent format isn't JuPyText specific, not sure if Spyder or other IDEs respect the indented comments. Also, this could be an issue with round trip conversions with other formats, no? What about indented doc strings as text markers etc...

Also, doing it creates some ambiguity of what the 'right' indentation should be. Like if I move the comment further right along with the code block, does all the code respect it's indentation or not? That could be another way you could 'guard' code in a main method and have it be the same between a notebook and a script.

mwouts commented 4 years ago

It would help absolutely, but I am not sure if it's part of the 'standard' per say so it should probably be behind a flag/option maybe?

Well, my assumption being that most code cells start with non-indented code, this would concern very few notebooks/cells, so maybe making this optional is not really required. But sure, if you think otherwise, or if we find any occurrence in representative repos like e.g. Python for Data Science, we can make this optional.

the py:percent format isn't JuPyText specific, not sure if Spyder or other IDEs respect the indented comments

I agree. Still, maybe that is not a blocker? In the worst case, having an indented cell marker for the second cell in your example will just mean that some IDE will run the two cells as just one. Maybe the reasons for doing this, which are 1) with the indented cell marker, the jupytext file looks more Pythonic, and 2) it works better with black, outweight the inconvenience of bluring the cells for the IDE?

what the 'right' indentation should be

Here we are only discussing the indentation for the # %% marker, right? The content of the code cell should always match that of the Python script, shouldn'it?

Skylion007 commented 4 years ago

I agree. Still, maybe that is not a blocker? In the worst case, having an indented cell marker for the second cell in your example will just mean that some IDE will run the two cells as just one. Maybe the reasons for doing this, which are 1) with the indented cell marker, the jupytext file looks more Pythonic, and 2) it works better with black, outweight the inconvenience of bluring the cells for the IDE?

That's a fair argument, actually it probably should be the default. I just tested it and VSCode respects indented comments at the very least so that make me feel better it. The concern is with point 2 though as all my break points are under my if statement so blurring all the cells together make them function as one giant cell.

As for the 'right' indentation, I think you are right as that is a pretty odd an edge case if that's not the case. I can't think of any practical use case where you would want to change the indentation of a cell unless you want to do it for the entire script.

TLDR: On second thought I agree, make it a default option and that should hopefully solve this issue for the Py:Percent format at least.

mwouts commented 4 years ago

I found no indented code cell in https://github.com/jakevdp/PythonDataScienceHandbook:

git clone https://github.com/jakevdp/PythonDataScienceHandbook.git
cd PythonDataScienceHandbook
import os
import nbformat

for nb_file in sorted(os.listdir('notebooks')):
    if not nb_file.endswith('.ipynb'):
        continue

    nb = nbformat.read(os.path.join('notebooks', nb_file), as_version=4)
    for cell in nb.cells:
        if cell.cell_type!="code":
            continue

        for line in cell.source.splitlines():
            # is the first non-empty line indented?
            if line.strip():
                if line.startswith(' '):
                    print(cell.source)
                break

so changing the indent of the # %% marker for these indented cells looks reasonable indeed... I'll propose a PR for that soon.

Skylion007 commented 4 years ago

Found another interesting edge case to keep in mind. The following cell works when transferred to colab.

# %%
if __name__ == '__main__':
     print(1)
# %%
     # INDENTED COMMENT
     print(2)

but the following does not:

# %%
if __name__ == '__main__':
     print(1)
# %%

     # INDENTED COMMENT
     print(2)

The extra newline at the beginning of the cell seems to reset the indentation of the entire cell. Just something to keep in mind.

JuPyTer and VSCode however don't seem to care and both cells work fine in both of them. Colab gives an indentation error though.

mwouts commented 4 years ago

Hi @Skylion007 , would you mind giving it a try to the branch indented_markers_percent? I think you can simply do

pip install git+https://github.com/mwouts/jupytext.git@indented_markers_percent
Skylion007 commented 4 years ago

@mwouts Yeah, that seems to fix it on my end. At least presuming it rounds trips with --test locally and the outputted notebook appears to be identical.

Here is the example colab btw: https://github.com/facebookresearch/habitat-sim/blob/master/examples/tutorials/colabs/rigid_object_tutorial.ipynb

mwouts commented 4 years ago

Also, another issue to raise is that's impossible to escape spaces within a command in --pipe which created a lot of trouble writing the proper sed expression and having it run

Sorry, it took me weeks to reproduce this one... the good news is that I just hit the same issue when working on #553. It should be fixed on the branch pipe_isort, which I plan to merge soon.