Open fcollonval opened 2 years ago
Part of the errors are due to the additional id and link nbconvert and JupyterLab are adding on headings compared to gfm commonmark; e.g.
'<h3 id="foo">foo<a class="anchor-link" href="#foo">¶</a></h3>'
VS'<h3>foo</h3>'
Adding the normalization of GFM on the output HTML the results for JupyterLab is 293 failed, 378 passed
The artifact to compare the outputs is available at: https://github.com/jupyterlab/benchmarks/actions/runs/2200787289
So a more in-depth analysis shows the following group of discrepancies:
id | section | markdown | commonmark-gfm | JupyterLab |
---|---|---|---|---|
10 | Tabs | #\tFoo\n |
<h1>Foo</h1> |
<h1 id="Foo">Foo<a class="jp-InternalAnchorLink" href="#Foo" target="_self">¶</a></h1> |
112 | Fenced code blocks | `ruby\ndef foo(x)\n return 3\nend\n \n` |
<pre><code class="language-ruby">def foo(x)\n return 3\nend\n</code></pre> |
<pre><code class="cm-s-jupyter language-ruby"><span class="cm-keyword">def</span> <span class="cm-def">foo</span>(<span class="cm-variable">x</span>)\n <span class="cm-keyword">return</span> <span class="cm-number">3</span>\n<span class="cm-keyword">end</span>\n</code></pre> |
133 | HTML blocks | <Warning>\n*bar*\n</Warning>\n |
<warning> *bar* </warning> |
*bar* |
140 | HTML blocks | <script type="text/javascript">\n// JavaScript example\n\ndocument.getElementById("demo").innerHTML = "Hello JavaScript!";\n</script>\nokay\n |
<script type="text/javascript">// JavaScript example document.getElementById("demo").innerHTML = "Hello JavaScript!";</script><p>okay</p> |
<p>okay</p> |
141 | HTML blocks | <style\n type="text/css">\nh1 {color:red;}\n\np {color:blue;}\n</style>\nokay\n |
<style type="text/css">h1 {color:red;} p {color:blue;}</style><p>okay</p> |
<p>okay</p> |
id | section | markdown | commonmark-gfm | JupyterLab |
---|---|---|---|---|
1 | Tabs | \tfoo\tbaz\t\tbim\n |
<pre><code>foo\tbaz\t\tbim\n</code></pre> |
<pre><code>foo baz bim\n</code></pre> |
120 | HTML blocks | <div>\n *hello*\n <foo><a>\n |
<div>*hello* <foo><a> |
<div>*hello* <a rel="nofollow" target="_self"> </a></div> |
121 | HTML blocks | </div>\n*foo*\n |
</div>*foo* |
*foo* |
Actually marked is running the commonmark and gfm tests in its CI. The results as of Jan 4th 2022 are reported there: https://github.com/markedjs/marked/discussions/1202#discussioncomment-1907552
Interestingly, Github just added math rendering, so now there is another opinion about exactly what syntax is used to create math: https://github.blog/changelog/2022-05-19-render-mathematical-expressions-in-markdown/
Math is also rendering in github's notebook preview (see https://github.com/jupyter-widgets/ipywidgets/blob/master/docs/source/examples/Lorenz%20Differential%20Equations.ipynb, for example). It appears to use MathJax 3.2.0.
@williamstein pointing me out to some concerns about the GitHub implementation: https://nschloe.github.io/2022/05/20/math-on-github.html; they bring interesting points to have in mind for our parser.
Let's list the wanted feature for an ideal markdown parser for JupyterLab:
Reference: Notebook documentation
WIP Candidates / features matrix
marked.js | markdown-it | MyST-parser | |
---|---|---|---|
Support gfm | x[^1] | x[^4] | x |
Math syntax | x[^2] | ? | x |
Attachment | x[^3] | ? | ? |
Extensible | x | ? | |
MyST | x |
[^1]: Partly true - see test results [^2]: Using some pre processing [^3]: Using customized link handler [^4]: CommonMark run as part of the CI - GFM features available as plugins
Some comments:
What other are using?
@fcollonval a few days ago I rewrote the upstream markdown-it plugin I'm using for parsing out math, so in cocalc we fully parse math via a plugin, rather than some sort of hack involving parsing before or after markdown is used (like github and jupyter both do, I think). Here's the code:
https://github.com/sagemathinc/cocalc/blob/master/src/packages/frontend/markdown/math-plugin.ts
It's MIT licensed. My goal with that code is to align with upstream Jupyter in fidelity in terms of what is parsed as math. *In cases where there is a reasonable difference, I would lobby for Jupyter to change. As an example, my plugin parses this properly as inline math:
consider \begin{math}x^3\end{math} and ...
JupyterLab doesn't detect it as math at all. I think it's reasonable to detect.
It seems that https://github.github.com/gfm/ has not been updated for math support. Edit: which is what @fcollonval was saying above
In cases where there is a reasonable difference, I would lobby for Jupyter to change.
@williamstein - can you give a comprehensive description of what your plugin parses as math to typeset?
New Markdown parsers should also address existing Markdown bugs and feature requests, such as:
In cases where there is a reasonable difference, I would lobby for Jupyter to change.
@williamstein - can you give a comprehensive description of what your plugin parses as math to typeset?
It's by definition exactly what this file parses:
https://github.com/sagemathinc/cocalc/blob/master/src/packages/frontend/markdown/math-plugin.ts
when run as the first plugin in markdown-it. It would be a lot of (very valuable) work for that to get converted to an official spec. My goal with writing and iterating on math-plugin.ts has been to get fidelity with what I think JupyterLab does or should do, and I've incorporated significant feedback from my users. I would not be at all surprised if there are significant bad surprises related to the above linked code though. In fact, I can't wait to test it on the bugs @jweill-aws just listed, and see if my code isn't all broken or not on those...
I wrote up some thoughts in a README here along with a notebook testing the issues mentioned above:
There's a related discussion about math + markdown here: https://chat.zulip.org/#narrow/stream/2-general/topic/LaTeX.20math/near/1382932
Discussion about Markdown parsing is a long standing issue. This PR lays out a structure to test nbconvert and web frontend parser based on the GitHub flavored Commonmark tests.
Xref: https://github.com/jupyterlab/jupyterlab/issues/272