jupyter / nbconvert

Jupyter Notebook Conversion
https://nbconvert.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
1.69k stars 557 forks source link

nbconvert --to html skips Markdown within HTML tags #1125

Open vlbrown opened 4 years ago

vlbrown commented 4 years ago

nbconvert to html skips Markdown within HTML tags

e.g.

<blockquote>
This is a sentence containing some _italic_ and **bold**  text.
   * bullet point
   * bullet point

Here's a new paragraph.
</blockquote>

nbconvert --to html will bypass the conens of that blockquote section.

(Note that if you don't put in the </blockquote>, Markdown still renders as a blockquote (to the end of th cell) AND nbconvert --to html converts properly. Apparently, nbconvert is triggered by the closing tag, not the opening tag.

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Test notebook\n",
    "\n",
    "Here is some text\n",
    "\n",
    "<blockquote>\n",
    "This is quoted text with an _italicized part_ and some **bold text**.\n",
    "\n",
    "Here's a list\n",
    "   * item 1\n",
    "   * item 2\n",
    "   \n",
    "and some code:\n",
    "```\n",
    "print('hello, world')\n",
    "```\n",
    "</blockquote>\n",
    "\n",
    "And some more text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
jupyter nbconvert --version
5.6.0
joelostblom commented 3 years ago

This would be convenient, especially with the bootstrap-like colored boxes in JupyterLab, that show up nicely inside the notebooks:

image

I realize that it could be rewritten to use HTML tags, but I prefer to have it in markdown to be able to export to PDF when needed (even if the box is not included), and since I have scripts that parse the markdown headers.

mgeier commented 3 years ago

Yes, it would be great if nbconvert (and by extension nbviewer) could parse those contents of HTML tags!

@joelostblom

I prefer to have it in markdown to be able to export to PDF when needed (even if the box is not included)

When using my Sphinx extension nbsphinx the boxes will also be included in PDF output, see e.g. https://nbsphinx.readthedocs.io/_/downloads/en/0.7.1/pdf/#subsection.3.8

This could possibly also be added to nbconvert's PDF output?

mgeier commented 3 years ago

For future reference: https://github.com/jupyter/notebook/issues/1292

itcarroll commented 3 weeks ago

I thought I had found gold with the discovery of the magic blank line that makes markdown within <div> tags render on JLab for these alerts ... and then I found this deficiency in nbconvert. Strongly in support of a fix for this. Apparently a CommonMark solution has still not come about. turns out I did!

<blockquote>

This is a sentence containing some _italic_ and **bold**  text.
   * bullet point
   * bullet point

Here's a new paragraph.

</blockquote>

I got the correct result with jupyter nbconvert --to=html, just had to make sure I had a current version of mistune. In the process I learned that the blank line is part of the CommonMark spec, so not "magic" after all.

Recommend closing as fixed in a dependency.

itcarroll commented 3 weeks ago

A workaround is to supply --TemplateExporter.filters="{'markdown2html': 'nbconvert.filters.markdown.markdown2html_pandoc'}" in the call to jupyter nbconvert ....

Knowing little about how nbconvert works, this leads me to believe the bug is actually in the default markdown2html processor, i.e. mistune, and that pandoc interprets the CommonMark spec correctly here.

Current version of mistune works just fine.