mwouts / jupytext

Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
https://jupytext.readthedocs.io
MIT License
6.52k stars 386 forks source link

`jupytext` stripping `mystnb` file metadata. #1247

Closed davidorme closed 2 weeks ago

davidorme commented 2 weeks ago

We're using mystnb with sphinx to build MyST notebooks for inclusion in our project website. When we're writing those notebooks, we use jupyterlab with jupytext. The problem we're having is that opening a MyST notebook in jupyter-lab updates the YAML headers, but also strips out setting aimed at e.g. the mystnb execution.

As an example, we have a MyST notebook with headers:

---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.13.8
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
settings:
  output_matplotlib_strings: remove
mystnb:
  execution_mode: 'off'
---

After opening that file as a notebook in jupyterlab, the header YAML is immediately updated to the following. The jupytext version has been updated, but the settings and mystnb entries have been stripped out.

---
jupytext:
  formats: md:myst
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.16.2
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
---

The full set of packages and versions being used are:

Project packages ```sh $ poetry show alabaster 0.7.16 A light, configurable Sphinx theme anyio 4.4.0 High level compatibility layer for multiple asynchronous event loop implementations appnope 0.1.4 Disable App Nap on macOS >= 10.9 argon2-cffi 23.1.0 Argon2 for Python argon2-cffi-bindings 21.2.0 Low-level CFFI bindings for Argon2 arrow 1.3.0 Better dates & times for Python asttokens 2.4.1 Annotate AST trees with source code positions async-lru 2.0.4 Simple LRU cache for asyncio attrs 23.2.0 Classes Without Boilerplate autodocsumm 0.2.12 Extended sphinx autodoc including automatic autosummaries babel 2.15.0 Internationalization utilities beautifulsoup4 4.12.3 Screen-scraping library bleach 6.1.0 An easy safelist-based HTML-sanitizing tool. certifi 2024.6.2 Python package for providing Mozilla's CA Bundle. cffi 1.16.0 Foreign Function Interface for Python calling C code. cfgv 3.4.0 Validate configuration and produce human readable error messages. cftime 1.6.4 Time-handling functionality from netcdf4-python charset-normalizer 3.3.2 The Real First Universal Charset Detector. Open, modern and actively maintained alternative to Chardet. click 8.1.7 Composable command line interface toolkit cloudpickle 3.0.0 Pickler class to extend the standard pickle.Pickler functionality comm 0.2.2 Jupyter Python Comm implementation, for usage in ipykernel, xeus-python etc. contourpy 1.2.1 Python library for calculating contours of 2D quadrilateral grids coverage 7.5.4 Code coverage measurement for Python cycler 0.12.1 Composable style cycles dask 2023.12.1 Parallel PyData with Task Scheduling debugpy 1.8.2 An implementation of the Debug Adapter Protocol for Python decorator 5.1.1 Decorators for Humans defusedxml 0.7.1 XML bomb protection for Python stdlib modules distlib 0.3.8 Distribution utilities docutils 0.20.1 Docutils -- Python Documentation Utilities dpath 2.2.0 Filesystem-like pathing and searching for dictionaries executing 2.0.1 Get the currently executing AST node of a frame, and other information fastjsonschema 2.20.0 Fastest Python implementation of JSON schema filelock 3.15.4 A platform independent file lock. fonttools 4.53.0 Tools to manipulate font files fqdn 1.5.1 Validates fully-qualified domain names against RFC 1123, so that they are acceptable to modern bowsers fsspec 2024.6.0 File-system specification h11 0.14.0 A pure-Python, bring-your-own-I/O implementation of HTTP/1.1 httpcore 1.0.5 A minimal low-level HTTP client. httpx 0.27.0 The next generation HTTP client. hypothesis 6.104.1 A library for property-based testing identify 2.5.36 File identification library for Python idna 3.7 Internationalized Domain Names in Applications (IDNA) imagesize 1.4.1 Getting image size from png/jpeg/jpeg2000/gif file importlib-metadata 8.0.0 Read metadata from Python packages iniconfig 2.0.0 brain-dead simple config-ini parsing ipykernel 6.29.4 IPython Kernel for Jupyter ipython 8.25.0 IPython: Productive Interactive Computing isoduration 20.11.0 Operations with ISO 8601 durations isort 5.13.2 A Python utility / library to sort Python imports. jedi 0.19.1 An autocompletion tool for Python that can be used for text editors. jinja2 3.1.4 A very fast and expressive template engine. json5 0.9.25 A Python implementation of the JSON5 data format. jsonpointer 3.0.0 Identify specific nodes in a JSON document (RFC 6901) jsonschema 4.22.0 An implementation of JSON Schema validation for Python jsonschema-specifications 2023.12.1 The JSON Schema meta-schemas and vocabularies, exposed as a Registry jupyter-cache 1.0.0 A defined interface for working with a cache of jupyter notebooks. jupyter-client 8.6.2 Jupyter protocol implementation and client libraries jupyter-core 5.7.2 Jupyter core package. A base package on which Jupyter projects rely. jupyter-events 0.10.0 Jupyter Event System library jupyter-lsp 2.2.5 Multi-Language Server WebSocket proxy for Jupyter Notebook/Lab server jupyter-server 2.14.1 The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications. jupyter-server-terminals 0.5.3 A Jupyter Server Extension Providing Terminals. jupyterlab 4.2.3 JupyterLab computational environment jupyterlab-myst 2.4.2 Use MyST in JupyterLab jupyterlab-pygments 0.3.0 Pygments theme using JupyterLab CSS variables jupyterlab-server 2.27.2 A set of server components for JupyterLab and JupyterLab like applications. jupytext 1.16.2 Jupyter notebooks as Markdown documents, Julia, Python or R scripts kiwisolver 1.4.5 A fast implementation of the Cassowary constraint solver latexcodec 3.0.0 A lexer and codec to work with LaTeX code in Python. locket 1.0.0 File-based locks for Python on Linux and Windows markdown-it-py 3.0.0 Python port of markdown-it. Markdown parsing, done right! markupsafe 2.1.5 Safely add untrusted strings to HTML/XML markup. matplotlib 3.9.0 Python plotting package matplotlib-inline 0.1.7 Inline Matplotlib backend for Jupyter mdformat 0.7.17 CommonMark compliant Markdown formatter mdformat-frontmatter 0.4.1 An mdformat plugin for parsing / ignoring frontmatter. mdformat-tables 0.4.1 An mdformat plugin for rendering tables. mdit-py-plugins 0.4.1 Collection of plugins for markdown-it-py mdurl 0.1.2 Markdown URL utilities mistune 3.0.2 A sane and fast Markdown parser with useful plugins and renderers mypy 1.10.1 Optional static typing for Python mypy-extensions 1.0.0 Type system extensions for programs checked with the mypy type checker. myst-nb 1.1.0 A Jupyter Notebook Sphinx reader built on top of the MyST markdown parser. myst-parser 3.0.1 An extended [CommonMark](https://spec.commonmark.org/) compliant parser, nbclient 0.10.0 A client library for executing notebooks. Formerly nbconvert's ExecutePreprocessor. nbconvert 7.16.4 Converting Jupyter Notebooks (.ipynb files) to other formats. Output formats include asciidoc, html, latex, markdown, pdf, py, rst, script. nbconvert can be used both as a Py... nbformat 5.10.4 The Jupyter Notebook format nest-asyncio 1.6.0 Patch asyncio to allow nested event loops netcdf4 1.7.1.post1 Provides an object-oriented python interface to the netCDF version 4 library nodeenv 1.9.1 Node.js virtual environment builder notebook-shim 0.2.4 A shim layer for notebook traits and config numpy 1.26.4 Fundamental package for array computing in Python overrides 7.7.0 A decorator to automatically detect mismatch when overriding a method. packaging 24.1 Core utilities for Python packages pandas 2.2.2 Powerful data structures for data analysis, time series, and statistics pandocfilters 1.5.1 Utilities for writing pandoc filters in python parso 0.8.4 A Python Parser partd 1.4.2 Appendable key-value storage pexpect 4.9.0 Pexpect allows easy control of interactive console applications. pillow 10.3.0 Python Imaging Library (Fork) pint 0.20.1 Physical quantities module platformdirs 4.2.2 A small Python package for determining appropriate platform-specific dirs, e.g. a `user data dir`. pluggy 1.5.0 plugin and hook calling mechanisms for python pre-commit 2.21.0 A framework for managing and maintaining multi-language pre-commit hooks. prometheus-client 0.20.0 Python client for the Prometheus monitoring system. prompt-toolkit 3.0.47 Library for building powerful interactive command lines in Python psutil 6.0.0 Cross-platform lib for process and system monitoring in Python. ptyprocess 0.7.0 Run a subprocess in a pseudo terminal pure-eval 0.2.2 Safely evaluate AST nodes without side effects pybtex 0.24.0 A BibTeX-compatible bibliography processor in Python pybtex-docutils 1.0.3 A docutils backend for pybtex. pycparser 2.22 C parser in Python pydocstyle 6.3.0 Python docstring style checker pygments 2.18.0 Pygments is a syntax highlighting package written in Python. pyparsing 3.1.2 pyparsing module - Classes and methods to define and execute parsing grammars pytest 7.4.4 pytest: simple powerful testing with Python pytest-cov 3.0.0 Pytest plugin for measuring coverage. pytest-datadir 1.5.0 pytest plugin for test data directories and files pytest-mock 3.14.0 Thin-wrapper around the mock package for easier use with pytest python-dateutil 2.9.0.post0 Extensions to the standard Python datetime module python-json-logger 2.0.7 A python library adding a json log formatter pytz 2024.1 World timezone definitions, modern and historical pyyaml 6.0.1 YAML parser and emitter for Python pyzmq 26.0.3 Python bindings for 0MQ referencing 0.35.1 JSON Referencing + Python requests 2.32.3 Python HTTP for Humans. rfc3339-validator 0.1.4 A pure python RFC3339 validator rfc3986-validator 0.1.1 Pure python rfc3986 validator rpds-py 0.18.1 Python bindings to Rust's persistent data structures (rpds) ruamel-yaml 0.18.6 ruamel.yaml is a YAML parser/emitter that supports roundtrip preservation of comments, seq/map flow style, and map key order ruamel-yaml-clib 0.2.8 C version of reader, parser and emitter for ruamel.yaml derived from libyaml scipy 1.14.0 Fundamental algorithms for scientific computing in Python send2trash 1.8.3 Send file to trash natively under Mac OS X, Windows and Linux setuptools 70.1.1 Easily download, build, install, upgrade, and uninstall Python packages shapely 2.0.4 Manipulation and analysis of geometric objects six 1.16.0 Python 2 and 3 compatibility utilities sniffio 1.3.1 Sniff out which async library your code is running under snowballstemmer 2.2.0 This package provides 29 stemmers for 28 languages generated from Snowball algorithms. sortedcontainers 2.4.0 Sorted Containers -- Sorted List, Sorted Dict, Sorted Set soupsieve 2.5 A modern CSS selector implementation for Beautiful Soup. sphinx 7.3.7 Python documentation generator sphinx-external-toc 1.0.1 A sphinx extension that allows the site-map to be defined in a single YAML file. sphinx-rtd-theme 2.0.0 Read the Docs theme for Sphinx sphinxcontrib-applehelp 1.0.8 sphinxcontrib-applehelp is a Sphinx extension which outputs Apple help books sphinxcontrib-bibtex 2.6.2 Sphinx extension for BibTeX style citations. sphinxcontrib-devhelp 1.0.6 sphinxcontrib-devhelp is a sphinx extension which outputs Devhelp documents sphinxcontrib-htmlhelp 2.0.5 sphinxcontrib-htmlhelp is a sphinx extension which renders HTML help files sphinxcontrib-jquery 4.1 Extension to include jQuery on newer Sphinx releases sphinxcontrib-jsmath 1.0.1 A sphinx extension which renders display math in HTML via JavaScript sphinxcontrib-mermaid 0.9.2 Mermaid diagrams in yours Sphinx powered docs sphinxcontrib-qthelp 1.0.7 sphinxcontrib-qthelp is a sphinx extension which outputs QtHelp documents sphinxcontrib-serializinghtml 1.1.10 sphinxcontrib-serializinghtml is a sphinx extension which outputs "serialized" HTML files (json and pickle) sqlalchemy 2.0.31 Database Abstraction Library stack-data 0.6.3 Extract data from python stack frames and tracebacks for informative displays tabulate 0.9.0 Pretty-print tabular data terminado 0.18.1 Tornado websocket backend for the Xterm.js Javascript terminal emulator library. tinycss2 1.3.0 A tiny CSS parser tomli-w 1.0.0 A lil' TOML writer toolz 0.12.1 List processing tools and functional utilities tornado 6.4.1 Tornado is a Python web framework and asynchronous networking library, originally developed at FriendFeed. tqdm 4.66.4 Fast, Extensible Progress Meter traitlets 5.14.3 Traitlets Python configuration system types-dataclasses 0.6.6 Typing stubs for dataclasses types-jsonschema 4.22.0.20240610 Typing stubs for jsonschema types-python-dateutil 2.9.0.20240316 Typing stubs for python-dateutil types-tqdm 4.66.0.20240417 Typing stubs for tqdm typing-extensions 4.12.2 Backported and Experimental Type Hints for Python 3.8+ tzdata 2024.1 Provider of IANA time zone data uri-template 1.3.0 RFC 6570 URI Template Processor urllib3 2.2.2 HTTP library with thread-safe connection pooling, file post, and more. virtualenv 20.26.3 Virtual Python Environment builder wcwidth 0.2.13 Measures the displayed width of unicode strings in a terminal webcolors 24.6.0 A library for working with the color formats defined by HTML and CSS. webencodings 0.5.1 Character encoding aliases for legacy web content websocket-client 1.8.0 WebSocket client for Python with low level API options xarray 2024.6.0 N-D labeled arrays and datasets in Python zipp 3.19.2 Backport of pathlib-compatible object wrapper for zip files ```
mwouts commented 2 weeks ago

Hi @davidorme , thanks for reaching out. Here you would need a filter on the notebook metadata (see https://jupytext.readthedocs.io/en/latest/advanced-options.html).

You can do this either on each notebook individually, or on your jupytext.toml file - see the attached PR for two tested examples.

mwouts commented 2 weeks ago

To follow-up on this, you want to use this value for notebook_metadata_filter: -jupytext.text_representation.jupytext_version,settings,mystnb.

You can set it either on a jupytext.toml file:

notebook_metadata_filter = "-jupytext.text_representation.jupytext_version,settings,mystnb"

or individually on each of your MyST notebook:

---
jupytext:
  formats: md:myst
  notebook_metadata_filter: -jupytext.text_representation.jupytext_version,settings,mystnb
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3 (ipykernel)
  language: python
  name: python3
mystnb:
  execution_mode: 'off'
settings:
  output_matplotlib_strings: remove
---