mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.61k stars 386 forks source link

Jupytext in a pre-commit disagreement with trailing-whitespace and prettier #580

Open tordbb opened 4 years ago

tordbb commented 4 years ago

Hi,

I am trying to run jupytext as a first step in pre-commit, before running among others trailing-whitespace and prettier.

My goal is for jupytext to generate .md files based on .ipynb files, since these are easier to version control than .ipynb. The idea is that pre-commit will pass if no new .md file is added, and if no existing .md file is changed by jupytext.

The way I implemented this seems to cause a problem with two other hooks, trailing-whitespace and prettier. The way jupytext formats the .md files cause trailing-whitespace and prettier to fail and adjusting the code. Is there any way to adjust jupytext or the way I use it, so that these other hooks do not fail?

This is my file .pre-commit-config.yaml which lives in the root directory:

# NOTE: The versions can be updated by calling
#        pre-commit autoupdate
repos:
  - repo: local
    hooks:
      - id: jupytext
        name: jupytext
        entry: jupytext --to .md notebooks/*.ipynb
        files: .ipynb
        language: python

  - repo: https://github.com/psf/black
    rev: 19.10b0
    hooks:
      - id: black

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.1.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer

  - repo: https://gitlab.com/pycqa/flake8
    rev: 3.8.3
    hooks:
      - id: flake8
        additional_dependencies:
          - flake8-bugbear
          - flake8-docstrings

  - repo: https://github.com/prettier/prettier
    rev: 2.0.5
    hooks:
      - id: prettier
        args: [--prose-wrap=always, --print-width=88]

  - repo: local
    hooks:
      - id: pylint
        name: pylint
        entry: pylint
        language: system
        types: [python]
        additional_dependencies: ["-r requirements.txt"]

If I update an .ipynb file and run pre-commit, this happens:

image

Immediately re-running pre-commit causes all other notebooks to be updated by jupytext,.

tordbb commented 4 years ago

Seems like the problem was that .ipynb files were not checked or modified by trailing-whitespace and prettier, which created a unhelpful loop that happens every time pre-commit is being run:

  1. Jupytext imports whitespace-errors from dirty .ipynb file into .md file, and pre-commit sees this as jupytext failing.
  2. trailing-whitespace fixes the whitespace-error , and pre-commit sees this as trailing-whitespace failing.

So the solution, I guess, is either to disband the goal mentioned above, or to run all hooks also on .ipynb files. Not sure which I should go for.

mwouts commented 4 years ago

Hi @tordbb , thanks for taking the time to report this.

I must say that I am not sure yet what is the best way to use pre-commit hooks on notebooks (see also #432 , #446). This is a topic on which I'd like to make progress, but, just like you did here, many times I encountered difficulties in applying pre-commit hooks on pairs of files.

Maybe you could consider doing the following: a) Apply the pre-commit hook to .md files only, and let Jupytext's plugin for Jupyter do the sync to the .ipynb file, using paired notebooks b) Or, add an additional step at the end of your pre-commit hook: jupytext --to ipynb --update *.md to update the .ipynb file - but again, I am not sure how well the pre-commit hook works with files that you don't include in the commit - or maybe you can git add and the reset the .ipynb file in the hook itself like we do here? c) Or, apply the pre-commit hooks to .ipynb files only, using jupytext --pipe-fmt markdown --pipe [cmd]

tordbb commented 4 years ago

Thanks again for your help! I found a viable solution by following your suggestion of adding another jupytext hook at the end.

The problem turned out to be how prettier did two changes to .md notebooks that were not being imported to .ipynb notebooks by jupytext:

When I made prettier exclude all files in the notebooks directory from inspection, everything worked out great. Ideally I wanted to include ipynb files, but did not find a regex that only excluded .md files inside notebooks/. Feel free to share if you can think of one.

I am not sure if jupytext should adapt the formatting that prettier idolizes, but that may be up to you.

See my final .pre-commit-config.yaml below:

# NOTE: The versions can be updated by calling
#        pre-commit autoupdate
repos:
  - repo: local
    hooks:
      - id: jupytext
        name: jupytext --to md
        entry: jupytext --to .md notebooks/*.ipynb
        files: .ipynb
        language: python

  - repo: https://github.com/psf/black
    rev: 19.10b0
    hooks:
      - id: black

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v3.1.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer

  - repo: https://gitlab.com/pycqa/flake8
    rev: 3.8.3
    hooks:
      - id: flake8
        additional_dependencies:
          - flake8-bugbear
          - flake8-docstrings

  - repo: https://github.com/prettier/prettier
    rev: 2.0.5
    hooks:
      - id: prettier
        args: [--prose-wrap=always, --print-width=88]
        # Avoid prettier from harmfully changing .md notebooks
        exclude: notebooks/

  - repo: local
    hooks:
      - id: pylint
        name: pylint
        entry: pylint
        language: system
        types: [python]
        additional_dependencies: ["-r requirements.txt"]

  - repo: local
    hooks:
      - id: jupytext
        name: jupytext --to ipynb
        entry: jupytext --to ipynb --update notebooks/*.md
        files: .md
        language: python
mwouts commented 4 years ago

Thank you @tordbb for sharing this! At some point I'd like to write a quick post or documentation page on how to best use pre-commit hooks, may I reuse your config above?

a regex that only excluded .md files inside notebooks/

Nice challenge... let me think about it...

I am not sure if jupytext should adapt the formatting that prettier idolizes

How that's interesting... Jupytext uses yaml to encode the YAML header, so I am a bit surprised that the quotes don't match those of prettier.. do you have any idea how prettier parses the YAML, and which one is the most standard?

Regarding the extra blank line, Jupytext should include a blank line between any two cells (and thus match prettier, right?) When you don't get a line in between, this is because the cell has a cell metadata lines_to_next_cell=0, which is itself coming from the text file, if at some point the markdown cell and the code cell were not separated by a blank line.

tordbb commented 4 years ago

Hi!

Off course, feel free to use it all. I am so thankful for your work on jupytext, and would be honored if my small contribution is somehow helpful in your work - published or not.

Regarding your question about YAML, I am afraid I do not know.

Thanks again, and have a great day!

On Fri, Jul 31, 2020, 18:29 Marc Wouts notifications@github.com wrote:

Thank you @tordbb https://github.com/tordbb for sharing this! At some point I'd like to write a quick post or documentation page on how to best use pre-commit hooks, may I reuse your config above?

a regex that only excluded .md files inside notebooks/

Nice challenge... let me think about it...

I am not sure if jupytext should adapt the formatting that prettier idolizes

How that's interesting... Jupytext uses yaml to encode the YAML header, so I am a bit surprised that the quotes don't match those of prettier.. do you have any idea how prettier parses the YAML, and which one is the most standard?

Regarding the extra blank line, Jupytext should include a blank line between any two cells (and thus match prettier, right?) When you don't get a line in between, this is because the cell has a cell metadata lines_to_next_cell=0, which is itself coming from the text file, if at some point the markdown cell and the code cell were not separated by a blank line.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mwouts/jupytext/issues/580#issuecomment-667213881, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXBS4F5TNXFP72U7IO36IDR6LWOFANCNFSM4PN45FUA .

Skylion007 commented 4 years ago

@tordbb @mwouts I quite like my solution to using JuPyText as a pre-commit hook which can be found below:

-   repo: local
    hooks:
    -   id: jupytext-sync
        name: Sync scripts and notebooks
        files: '^examples/tutorials/(colabs|nb_python)/(.*\.py|.*\.ipynb)$'
        entry: jupytext --update-metadata '{"jupytext":{"notebook_metadata_filter":"all", "cell_metadata_filter":"-all"}, "accelerator":"GPU"}' --set-formats 'nb_python//py:percent,colabs//ipynb' --pipe black --pipe "sed s/[[:space:]]*\#[[:space:]]\%\%/\#\%\%/g" --pipe 'isort -' --pipe-fmt 'py:percent' --sync
        pass_filenames: true
        additional_dependencies:
            - 'jupytext==1.5.2'
            - black
            - isort
        always_run: false
        language: python

This has some benefits:

TLDR: Use --set-formats and --sync will solve a lot of your issues @tordbb since which file is newer will overwrite the changes in the other.

mwouts commented 4 years ago

Thank you @tordbb and @Skylion007 . This is getting interesting! When time permits I will experiment a bit more with this myself, and write a post with all due credit to your examples !

@Skylion007 , do we agree that --pipe "sed s/[[:space:]]*\#[[:space:]]\%\%/\#\%\%/g" was motivated by #562? With Jupytext from master (to be shipped in 1.6.0) you won't need this any more (and spaces are now allowed in pipe commands :smiley: ).

Also I see you use --pipe 'isort -', don't you experience issues like #553 ? On my side I now use --pipe 'isort - --treat-comment-as-code "# %%" --float-to-top' (requires isort in dev version, as these two options were introduced very recently by @timothycrosley, the author of isort).

Skylion007 commented 4 years ago

@mwouts Yeah, but my pre-commit hook downloads from pip so until it's published I can't remove that workaround. 👍 I do experience issues like #553, but it hasn't broken anything, yet. Also waiting for that fix to be merged.

Skylion007 commented 4 years ago

@mwouts More than that, I would like to code up a simple pre-commit yaml for the repo. One that doesn't need to use the current the --pre-commit arg and uses the nice features baked into pre-commit.com

mwouts commented 4 years ago

I would like to code up a simple pre-commit yaml for the repo

Oh, that would be awesome!

grst commented 2 years ago

I am also having an issue with jupytext and prettier generating conflicting formatting. In my case, the issue is caused by our .editorconfig setting the default indentation to four spaces also for yaml files. That's to say that unless jupytext respect all sort of config files when generating the markdown, there's no way to achieve perfect consistency with prettier or other formatting tools.

image

I think the best would be to pipe the output through prettier as well, as suggested for black:

-args: ['--pipe', 'black']
+args: ['--pipe', 'black', '--pipe', 'prettier -w {}']

However, it is the markdown file that would need to be piped through prettier, and I don't think one can have multiple --pipe-fmt in the same command:

jupytext --sync --pipe-fmt py:percent --pipe black --pipe-fmt md --pipe 'prettier -w {}' 
mwouts commented 2 years ago

Hi @grst , I do like the suggestion of allowing multiple --pipe-fmt arguments. We could implement that (I'd require that --pipe-fmt would be either of length 1, or the same length as --pipe), but I was trying to see if I could write a test, and I see that we already have an example of a pre-commit hook where Jupytext is called twice, with two different --pipe-fmt: https://github.com/mwouts/jupytext/blob/12964ef42783722815cd5d12df47d5ce67c1b323/tests/test_pre_commit_5_reformat_markdown.py#L38-L46

Do you think the same approach could be used to call prettier (say, in addition to black) when doing jupytext --sync?

grst commented 2 years ago

I haven't tried it, but as long as it's two separate steps in pre-commit, I'd expect it to suffer from the same problem: If the first step generates a different formatting than the second one they cannot pass at the same time (pre-commit fails if a file got modified).

That being said, in the above example it could work, i.e. the first jupytext command only modifies an ipynb file (e.g. formatting it with black), the markdown is only updated in the second step.