lonekorean / wordpress-export-to-markdown

Converts a WordPress export XML file into Markdown files.
MIT License
1.09k stars 222 forks source link

Br tags #24

Closed thedamon closed 4 years ago

thedamon commented 4 years ago

Noticed the export doesn't do anything with some break tags, though I'm not sure exactly what. Some break tags seem to be preserved, but the following example (and many like it) is ending up as one line in a paragraph after import:

<p>Quest, Team Shadetek / Mirage, Brooklyn Anthem / Uproot (2008)<br>
The Bug  / Poison Dart (feat. Warrior Queen) / London Zoo (2008)<br>
Ghislain Poirier / No More Blood feat. Zulu / No Ground Under (2007)<br>
Tanya Stephens / Put It On You / Rebelution (2006)<br>
Lady Saw / Chat To Mi Back  / Walk Out (2007)</p>

Strangely I put that sample into turndown and it came back correctly; and I didn't see any settings in this library's use of turndown that seemed relevant.

lonekorean commented 4 years ago

I tested your chunk of text in an XML export file, both with and without line breaks after the <br>s (snippet):

<content:encoded><![CDATA[<!-- wp:paragraph -->
<p>Quest, Team Shadetek / Mirage, Brooklyn Anthem / Uproot (2008)<br>
The Bug  / Poison Dart (feat. Warrior Queen) / London Zoo (2008)<br>
Ghislain Poirier / No More Blood feat. Zulu / No Ground Under (2007)<br>
Tanya Stephens / Put It On You / Rebelution (2006)<br>
Lady Saw / Chat To Mi Back  / Walk Out (2007)</p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<p>Quest, Team Shadetek / Mirage, Brooklyn Anthem / Uproot (2008)<br>The Bug  / Poison Dart (feat. Warrior Queen) / London Zoo (2008)<br>
Ghislain Poirier / No More Blood feat. Zulu / No Ground Under (2007)<br>Tanya Stephens / Put It On You / Rebelution (2006)<br>Lady Saw / Chat To Mi Back  / Walk Out (2007)</p>
<!-- /wp:paragraph -->]]></content:encoded>

It seemed to come out fine in the markdown file (screenshot):

Annotation 2020-02-17 082210

Please let me know if I misunderstood something, but it looks correct. Maybe something downstream is displaying your markdown incorrectly? But yeah as you said this all happens within turndown.

thedamon commented 4 years ago

Weird! I pulled the html from the markup of the page rather than the actual database so I'm wondering if that's the discrepency. I ended up finding a wp->jekyll plugin that did what i needed right out of the box.

Since you're getting the right results, if there is something odd happening on my end, it's probably an issue of turndown not this library.