khajavi / pandoc

Automatically exported from code.google.com/p/pandoc
GNU General Public License v2.0
0 stars 0 forks source link

Better handling of <pre><code> blocks in HTML to markdown conversion #133

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently HTML to markdown conversion strips out HTML markup inside
<pre><code> tags.  It would be more sensible if such blocks were simply
left alone, except of course stripping off the opening and closing
<pre><code>...</code></pre> tags and indenting the content!  At least this
behavior should be selectable with an option.

I have uploaded a Perl script which simulates the desired behavior
(actually it leaves the content of all <pre> blocks alone!) with Issue 132
regarding a similar problem with tables.

<http://code.google.com/p/pandoc/issues/detail?id=132>

Original issue reported on code.google.com by bpjonsson@gmail.com on 5 Mar 2009 at 2:06

GoogleCodeExporter commented 9 years ago
Stripping off the <pre><code> tags and indenting the content won't work.
HTML tags like <span color="red"> inside an indented code block are treated as
verbatim text, which isn't what you want.

Perhaps pandoc should scan for '<' inside <pre><code>...</code></pre> contexts.
If it finds '<' it could leave the whole thing (including the <pre><code>) as 
it is,
rather than converting it to a pandoc code block.

This would guarantee that HTML formatting inside code blocks doesn't get lost.

On the other hand:  it's of the essence of html -> markdown conversion that some
information will be lost, so one might wonder why it's so bad here if the 
formatting
is removed.  Some users may think the advantages of converting the <pre><code> 
block
into a portable format that can in turn be converted to LaTeX or RTF outweigh 
the
disadvantages of losing formatting information inside the code block.  So I'm 
really
not convinced that the change is needed.

Original comment by fiddloso...@gmail.com on 17 Mar 2009 at 6:05

GoogleCodeExporter commented 9 years ago
Couldn't there be an option for this?  Or --parse-raw could be made sensitive 
to it?

/BP

Original comment by bpjonsson@gmail.com on 5 Apr 2009 at 12:09