html dedent for markdown export

jupyter / nbconvert

Jupyter Notebook Conversion

https://nbconvert.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

1.73k stars 566 forks source link

html dedent for markdown export #996

Closed amniskin closed 5 years ago

amniskin commented 5 years ago

nbconvert doesn't dedent 'text/html' outputs, which leads to errors down the line (specifically when converting notebooks that use plotly to Markdown).

To exemplify this issue, convert any notebook that uses plotly to markdown and you'll see the plotly js code (that came from a text/html output cell output -- but was indented) is displayed as verbatim rather than executed. Given that HTML is dedent invariant (that was awkward to write) it seems like the reader should dedent by default?

I started digging into the code, but wasn't sure where would be the best place to add this. Is this something you all would be into? I can write a pull request, I just need to know where would be a sane place for it?

MSeal commented 5 years ago

Would you mind posting images and/or an example notebook with the command used to make sure problem is clear for everyone?

The code which performs the translation is here and the template it uses is here. The pandoc library is used for the actual template implementation, which flows through here. How these modules interact is unfortunately not super simple, but this is a starting point to explore. The template appears to naively print the text/html without any other processing, so likely an improvement to use a filter or function that improves this behavior would be welcome.

amniskin commented 5 years ago

As for a minimal example:

2019-05-12-154235_958x1031_scrot And the associated markdown output: 2019-05-12-154408_958x529_scrot

It seems like this is a problem with the text/html cell parser itself though, no? By that I mean, since HTML is white-space invariant, any leading whitespace can be removed without changing the interpretation of the HTML block. That way if later some other format is sensitive to whitespace, we won't have to remember this there too. It would also make it easier to export a relatively human readable HTML file (with proper indentation).

P.S. Sorry it took so long to respond, I broke the python install on my personal computer and hadn't spent the time to fix it until now.

MSeal commented 5 years ago

Seems like a reasonable request. Thanks for gathering the info and adding images. I haven't read through the code-paths involved in a while but your reasoning on where to implement sounds right.

amniskin commented 5 years ago

I noticed you tagged this with "enhancement" but it's really a bug. The Markdown generated isn't equivalent to the HTML generated, which it should be. The preserved indentation causes some HTML to be displayed as verbatim text rather than HTML to be processed and added to the DOM.

This is particularly annoying if you use plotly because plotly inserts indented code into the HTML cells, and then your plotly javascript initialization call gets inserted as verbatim and never gets called. So that none of your plots show up properly.