jupytext.jupytext.read and jupytext.jupytext.write don't perfectly roundtrip

mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts

https://jupytext.readthedocs.io

MIT License

6.65k stars 386 forks source link

jupytext.jupytext.read and jupytext.jupytext.write don't perfectly roundtrip #993

Closed MarcoGorelli closed 2 years ago

MarcoGorelli commented 2 years ago

Example:

Make a file t.md with the following:

---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.14.1
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

```{code-cell} ipython3
:tags: [skip-flake8]

import os

import glob

run

import jupytext
read = jupytext.jupytext.read('t.md')
jupytext.jupytext.write(read, 't_new.md')

Then, if I open t_new.md, it's exactly the same as t.md, but it has:

{code-cell}

instead of

{code-cell} ipython3

Is there a way to round-trip a file such that it retains these directives?

Versions:

$ python --version
Python 3.8.5
$ python -c 'import jupytext; print(jupytext.__version__)'
1.14.1

MarcoGorelli commented 2 years ago

Looks like I just need to add:

read['metadata']['language_info'] = {'pygments_lexer': 'ipython3'}

mwouts commented 2 years ago

Thank you @MarcoGorelli for documenting this! This is very helpful.

MarcoGorelli commented 2 years ago

Reckon read should pull that out from the .md file and add it to the notebook metadata?

mwouts commented 2 years ago

This might be more user friendly indeed. @chrisjsewell what is your opinion on this ?

MarcoGorelli commented 2 years ago

Just FYI, my use-case would be to support running linters/formatters directly on jupytext .md files (https://github.com/nbQA-dev/nbQA/pull/745 )

For now I'll get around it with

    # get lexer: see https://github.com/mwouts/jupytext/issues/993
    parser = MarkdownIt("commonmark").disable("inline", True)
    parsed = parser.parse(content)
    lexer = None
    for token in parsed:
        if token.type == "fence" and token.info.startswith("{code-cell}"):
            lexer = remove_prefix(token.info, "{code-cell}").strip()
            md_content["metadata"]["language_info"] = {"pygments_lexer": lexer}
            break