Open chrisjsewell opened 2 years ago
Is there something in the mdast ecosystem to point to?
React is classNames
and html is class. Both are strings.
Can we have an array of classes with spaces in them?
This shouldn't be allowed:
classes:
- 'myClass mySecondclass'
Instead:
classes:
- myClass
- mySecondclass
FYI in unified-myst, this is what I am currently doing: https://github.com/executablebooks/unified-myst/blob/096dd8da49ce609ea9a1edec4a492e3798f63df1/packages/core-parse/src/directiveProcessor.js#L83-L111
@rowanc1 and @fwkoch further to our discussion regarding identifier
:
on further thought, I feel it's just irreconcilable with jupyter-book/myst-parser, to only allow a single identifier per element.
Take this simple example:
# main
## subtitle
(target1)=
(target2)=
### Sub-subtitle
[ref1](target1)
[ref2](sub-subtitle)
This is how it is resolved by docutils:
$ myst-docutils-pseudoxml test.md
<document ids="main" names="main" source="test.md" title="main">
<title>
main
<subtitle ids="subtitle" names="subtitle">
subtitle
<target refid="target1">
<target refid="target2">
<section ids="sub-subtitle target2 target1" names="sub-subtitle target2 target1">
<title>
Sub-subtitle
<paragraph>
<reference refid="target1">
ref1
<reference refid="sub-subtitle">
ref2
As you can see, not only is the header assigned the identifiers coming from the targets, it is also assigned a "slug" identifier based on its content (which is not an unusual practice when rendering Markdown).
Not allowing multiple identifiers would render this example, and by extension jupyter-book itself, non myst-spec compliant, which is obviously extremely problematic 😬.
To clarify some extra terminology from docutils:
Here also is the rendering of this example as html/latex:
$ myst-docutils-html5 test.md
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
...
<body>
<main id="main">
<h1 class="title">main</h1>
<p class="subtitle" id="subtitle">subtitle</p>
<section id="sub-subtitle">
<span id="target2"></span><span id="target1"></span>
<h2>Sub-subtitle</h2>
<p>
<a class="reference internal" href="#target1">ref1</a>
<a class="reference internal" href="#sub-subtitle">ref2</a>
</p>
</section>
</main>
</body>
</html>
$ myst-docutils-latex test.md
...
\begin{document}
\title{main%
\label{main}%
\\%
\DUdocumentsubtitle{subtitle}%
\label{subtitle}}
\author{}
\date{}
\maketitle
\section{Sub-subtitle%
\label{sub-subtitle}%
\label{target2}%
\label{target1}%
}
\hyperref[target1]{ref1}
\hyperref[sub-subtitle]{ref2}
\end{document}
FYI, if you want to see how anything is resolved by myst-docutils, simply install https://github.com/pypa/pipx, and pipx install myst-docutils
, which will give you access to the above CLIs
Can you point to a user using multiple stacked targets in a Jupyter Book that exists today? Or an example of this being used in a sphinx project?
A few notes:
My take: multiple-ids are unused (I have never seen this in any non-contrived, user example[^1]), bring no additional features to the end user, and can be easily cleaned up by throwing warnings in a post-parsing transform in any implementation. Introducing a list of IDs to refer to a single element is a significant additional complexity that means all state management becomes harder, especially for cross-project linking (e.g. some equivalent of inter-sphinx, or any work around PIDs ongoing in research/library communities).
Looking forward to talking this through on Monday. There are lots of options on how to support this in Python/JB before passing on to sphinx. I am suggesting we support a subset of sphinx's complexity, and provide tools to help users refactor their documents with explicit references/labels.
[^1]: With the possible exception of implicit references that have subsequently been made explicit. I think this can be taken care of in a state-management task rather than in the MDAST spec though.
Looking forward to talking this through on Monday.
Yeh absolutely, happy to discuss. What I want to emphasize, is this is not a trivial choice. As we have discussed previously, myst-spec should initially represent what myst actually is now, not what we want it to be in the future
Can you point to a user using multiple stacked targets in a Jupyter Book that exists today?
Any project that refers to headings by both targets and heading slugs.
There are lots of options on how to support this in Python/JB before passing on to sphinx. The python parser could easily: throw a warning on multiple stacked labels before passing on to sphinx
I feel this is somewhat a misunderstanding of how Jupyter Book (via myst-parser) works: None of this processing is done by myst-parser, it's all handled by docutils/sphinx. Getting mst-parser to act in this manner, if it could be done, would at least require a substantial re-write, to override core parts of docutils functionality
There is no reason that myst has to support all quirks of docutils
I would not say that this is merely a quirk of docutils though, it is a core design aspect: https://docutils.sourceforge.io/docs/ref/doctree.html#common-attributes
significant additional complexity, especially for cross-project linking (e.g. some equivalent of inter-sphinx)
But inter-sphinx already does work with multiple IDs
This is a divergence from what is set out in mdast (which already has precedence for identifier/label)
I feel this is a misunderstanding of what identifier
is actually used for in MDAST.
It is not a canonical ID for a node and, whether we use singular or multiple IDs for a node, they should not be stored under identifier
, specifically to delineate from MDAST's identifier
Take as an example:
[a]
[a]: https://example1.com
[a]: https://example2.com
goes to MDAST resembling
<paragraph>
<linkReference identifier="a">
<definition identifier="a">
<definition identifier="a">
linkReference
has an identifier
which is not actually its identify, it is what is referencing (https://github.com/syntax-tree/mdast#association)identifier
(because they are eventually resolved "implicitly")definition.identifier
can only be referenced by linkReference
, they are completely independent of myst identifiers, e.g. you cannot do {ref}`a`
this is also the same for footnoteReference
/footnoteDefinition
Whether we use something like mystId
(singular) or mystIds
(plural), a core requirement should be:
in a "well-formed" document, I am able to walk through the AST, and generate an unambiguous mapping of REFID -> Node
, in order to resolve what a {ref)
is pointing towards.
For this requirement, note it does not actually matter whether the relationship is one-to-one, or many-to-one
(just as long as it is not one-to-many, or many-to-many)
Helping users get to a canonical ID should be a goal of our work.
Taking the above discussion, I would ask what do you mean by a canonical ID?
Since you can essentially have multiple ID "sets" within a single document: IDs relating definitions, footnotes, {ref}
, Jupyter code cells, intersphinx (there is now a separate external
role (https://github.com/sphinx-doc/sphinx/pull/9822).
sphinx essentially handles this via the any
role, and the resolution logic underpinning it (https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html?highlight=roles#role-any).
Domains can maintain their own identifier maps, for particular reference sets.
A thing that one might consider, is also setting a (probably SHA256) UUID for every node in the AST. This would provide an "unequivocal" identifier for all nodes, irrespective of what was referencing it. Then specific reference names, are just aliases to those
As specified here: https://docutils.sourceforge.io/docs/ref/doctree.html#common-attributes, there are some common attributes associated with all docutils nodes, and this should essentially be the same here.
As an example, here: https://github.com/executablebooks/myst-spec/blob/35f80974a69f68490b007c3e6d919ed246f64594/docs/examples/directives.admonitions.yml#L122
This should be
classes: ['tip']