Turn <para> tags into line breaks

bert2 commented 4 years ago

This patch turns <para> tags into \r\n line breaks which mddox will turn into <br>s later.

It also adds processing of tags with ProcessTags() in some places where it has been forgotten. At least I think so. Please check if they were omitted on purpose instead.

That "parsing" logic is pretty basic, but should work for the most cases.

Those are the cases I tested manually:

<summary>
<para>
foo
</para>
<para>
bar
</para>
</summary>

<summary>
<para>foo</para>
<para>
bar
</para>
</summary>

<summary>
<para>
foo
</para>
</summary>

<summary>
<para>
foo
</para>
bar
</summary>

<summary>
<para>foo</para>
</summary>

<summary>
<para>foo</para><para>bar</para>
</summary>

<summary>
<para>foo</para>
</summary>

<summary>
<para>
foo
<para>
bar
</para>
</para>
</summary>

Here is a case I noticed that won't work perfectly:

<summary>
<para>
foo
<para>
bar
</para>
</para>
qux
</summary>

The resulting markdown would be:

foo<br><br>
bar<br><br>
<br><br>
qux

As you can see there are too many breaks between bar and qux. I think this is something we could live with, considering that the input XML has an unusual structure to begin with.

A viable fix for this might be to ensure that all sequences of \r\n have a maximum length of two at the end of RemoveParaTags() using RegexReplace("(\r\n){3,}", "\r\n\r\n"). However, this will also remove any of such line breaks the user typed into the comments intentionally.

loxsmoke commented 4 years ago

What is the intended result of nested tags? It looks like an edge case that could be ignored for now.

bert2 commented 4 years ago

I don't know. I can't think of a good reason to do it, but then again... you'll never know :) But I agree that ignoring this case is the best option.

loxsmoke commented 4 years ago

I do not see much info on on MSDN either so the current implementation is OK.

loxsmoke / mddox

Turn <para> tags into line breaks #9