jupyter-book / mystmd

Command line tools for working with MyST Markdown.
https://mystmd.org/guide
MIT License
219 stars 64 forks source link

Bundle some basic markdown manipulation tools with mystmd (for both Python and JavaScript) #1547

Open choldgraf opened 2 months ago

choldgraf commented 2 months ago

As people use MySTMD more heavily, they may run into cases where they want to build custom plugins or do scripting with MyST documents. A nice example of this is Rowan's auto-generated MEP table.

In these cases, it's common to do some manual parsing of MyST files in order to grab metadata, read page content, etc. This usually involves some clunky workflows where you try to remember how to parse a markdown file, parse the YAML header, and turn it into a dictionary.

Proposal

It would be helpful if MyST provided some basic "MyST Manipulation" tools as helper functions to encourage more standardized development and lower the energy barrier there. For example (and with arbitrary names):

Don't have strong opinions on the specifics, I just wanted to give the above two functions as examples!

nthiery commented 1 month ago

Yes! That would be very nice. Here is are some use cases for education:

  1. Parse all myst files in my course, and extract all definitions (e.g. marked up as admonitions with 'Définition' in the title) to generate flashcards
  2. Parse all myst files in my course, and extract annotations (as a step toward adaptative learning with Jupyter, I am experimenting with adding annotations about the various learning objects in the course). 3.Parse all myst files in my course, and split in chunks to feed to a RAG.
  3. Parse all myst files in my course, and extract a short review document with all definitions, theorems, and other important things to remember.

It would be very convenient to be able to:

All in all, this is of the same flavor as xslt which was about making it easy to harvest and transform xml files without having to rewrite the parsing logic.

agoose77 commented 1 month ago

The spirit of this issue is to improve the tooling, and I 100% agree with that goal!

In the meantime, you can already do all of these things in two-passes (which is only really an inconvenience if you want to include the results back into the build, i.e. "Implement 4 above").

The generated .json files from a site build contain all the information that you need to be able to investigate the final AST. If you are happy doing this in JS / TypeScript, you can use the unist-util libraries to make AST walking easy. If you're using Python, there are examples of implementing these routines here: https://github.com/executablebooks/sphinx-ext-mystmd/blob/0c6656f82a28f5fdd67563093b0e19e7fe83d908/src/sphinx_ext_mystmd/utils.py#L63-L84

The AST structure is very straightforward to manipulate if you are familiar with the basic rules of unist: there are two unist node types (parents, and literals), and everything else is built upon that.