Open davidorme opened 3 months ago
Hi @davidorme , thank you for reporting this! We would need to make sure that this language information is preserved when the notebook is converted to a Jupyter notebook (the py:percent
format will then, in turn, preserve the cell metadata).
Let me check with @chrisjsewell who knows that part better than I do, what happens to that language specification when the conversion occurs.
Will put it on the todo list to have a look 😅 but feel free to ping me again if I don't reply
@chrisjsewell Sorry to ping you on this.
I've got jupyter-lab
and jupytext --pipe black
playing ping-pong with each other. When I'm writing docs in jupyter
as Myst Markdown files, those language tags are automatically added when the file saves (I'm assuming that this is something that jupytext
does?). But then when I commit the file, the pre-commit
setup using jupytext --pipe black
throws them all out again 😄.
It's not a huge deal - we're just only committing files stripped of code-cell
language information - but it would be good to fix it.
Oh actually I realize that this is an issue that has been going on for a very long time! See #759, #778, #789.
What happens is that the language specification on the code cell comes from the language_info
notebook metadata.
That information is in the notebook when you save it from Jupyter, but it is lost when you read the MyST file.
I see one immediate workaround: add the language_info
metadata to your MyST notebooks by adding this to your jupytext.toml
config:
notebook_metadata_filter="language_info"
On the longer term, I see two possible fixes:
ipython3
lexer)My preference goes to 1 but I am curious to hear yours @chrisjsewell @davidorme @parmentelat
I may have got this wrong but I have pyproject.toml
with:
[tool.jupytext]
# Stop jupytext from removing mystnb and other settings in MyST Notebook YAML headers
notebook_metadata_filter = """
settings,
mystnb,
language_info
"""
And then a markdown file with YAML headers:
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.16.4
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---
If I run jupytext --pipe black file.md
on that then the output reports:
[jupytext] Reading docs/source/users/demography/canopy.md in format md
[jupytext] Executing black -
All done! ✨ 🍰 ✨
1 file left unchanged.
[jupytext] Writing docs/source/users/demography/canopy.md in format md:myst
But all of the code-cell
language specifications have been stripped.
I see! You still don't have a language_info
metadata in your MyST file, that's why the pygment lexers go away. To add that metadata to your MyST file, you will have to open it in Jupyter, and save it using the new config file.
Alright. That took longer than expected:
pyproject.toml
in may case) before the actual notebooks if testing this workaround when you have pre-commit
. Because 🤦 pre-commit
stashes the changes to run the validation and so jupytext
uses the old config. I mean, it's sorta obvious but I stumbled.jupyter
in the same directory as the config file (or presumably point jupyter
to it correctly). I'm in the habit of running jupyter
in my docs
directory, so of course it wasn't picking up the config.But. With the config above committed and jupyter
started in the project root so it actually reads that config, opening and saving a notebook in jupyter
does add the following to the notebook YAML:
language_info:
codemirror_mode:
name: ipython
version: 3
file_extension: .py
mimetype: text/x-python
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.11.9
Saving in jupyter
also restores the language info to the code cells and now piping the notebook through black
does not strip the language info. So the workaround works.
- Apply the metadata filter before passing the notebook to MyST (e.g. if Jupytext is not configured to preserve the language info, then no cell would get the ipython3 lexer)
- Or, reconstruct the language_info within Jupytext, e.g. figure out how Jupyter does that, and do the same
I don't understand the boundaries between the different packages at all well, but if I understand correctly:
language_info
metadata to record the lexer information and assign code cell level language information.language_info
, because the language details are duplicated in kernel_spec
.code-cell
language information relies on the presence language_info
notebook metadata.I'm not sure what (1) adds beyond the workaround - does it mean that jupyter
stops adding the code-cell lexer info so the notebook content is more stable? It seems like this could just be a documentation update to say that the default behaviour is not to retain lexer information in notebooks, but that adding the language_info
back in to the retained metadata will allow lexer information to be retained?
I think I've run into a workflow that - if I understand correctly - argues for option (2). This usage might be out of scope for jupytext
but it feels like a reasonably natural thing to want to do.
The workflow is in creating Myst markdown notebooks for rendering using sphinx
. Users can of course create notebook content in juypter
but one of the advantages (joys?) of the Myst markdown format is that you don't have to because it is human readable. So:
touch simple.md
jupytext --set-format md:myst --set-kernel python3 simple.md
I've now got a file that I can use with myst-nb
in sphinx
to generate content.
But - if I've got this right - at present, the language_info
metadata will only be inserted if I open the file in jupyter
and then save it, having set the notebook metadata filter to preserve the language_info
metadata.
So in this use case, my simple.md
file will only be able to preserve the language on code-cell
blocks if I open and save it through jupyter
.
That feels clunky. I get that jupytext
is intended to primarily act as an interface with jupyter
but with this workflow jupyter
isn't really needed. If I understand right, your proposal (2) would allow jupytext
to set the language_info
in the same way that it sets format and kernel?
We're using
jupytext --pipe black
to automatically format Python code in MyST markdown notebooks (as part of pre-commit, but I don't think that's relevant here). The problem we're having is that (IIUC) the round trip through the percent format to pass it toblack
strips out the language specification on thecode-cell
directives. So given: