showdownjs / showdown

A bidirectional Markdown to HTML to Markdown converter written in Javascript
http://www.showdownjs.com/
MIT License
14.32k stars 1.57k forks source link

Bad <br> conversions from MD to HTML to MD #974

Open Skywalker13 opened 1 year ago

Skywalker13 commented 1 year ago

Hello,

A simple conversion from a markdown with breaks to HTML to markdown doesn't work as expected. The breaks are not translated as appropriate.

(showdown 2.1.0)

Input

There are spaces ' ' after text1 and text2, see the script.

text1  
text2  
text3

Script

'use strict';

const showdown = require('showdown');
const jsdom = require('jsdom');

const {JSDOM} = jsdom;
global.window = new JSDOM('', {}).window;

const converter = new showdown.Converter();

const source = `text1  \ntext2  \ntext3`;
const html = converter.makeHtml(source);
const md = converter.makeMarkdown(html);

console.log('MD');
console.log(source);
console.log('');
console.log('MD -> HTML -> MD');
console.log(md);

Output

MD
text1  
text2  
text3

MD -> HTML -> MD
text1<br>

 text2<br>

 text3
chrispymm commented 1 year ago

I came across this issue too today. Doesn't even require MD -> HTML -> MD conversion. The issue is simply in HTML -> MD.

Given the following html:

Line 1<br>
Line 2<br>
Line 3<br>

Showdown will produce the following markdown:`

Line 1<br>

Line 2<br>

Line 3<br>

Did some digging and realised that this is because the tagged release version of showdown 2.1.0 doesn't contain the makeMarkdown.break subparser, or the case statement in the makeMarkdown.node subparser that would call it - meaning that it goes to the default case in makeMarkdown.node which simply outputs the tag + \n\n.

I monkeypatched these in, and it mostly solves the problem. With the two changes mentioned above in the code the markdown output becomes:

Line 1
 Line 2
 Line 3

The issue now is the leading space appearing on the 2nd and 3rd lines. A little bit more investigation and I found line 400 in converter.js in the clean() function

child.nodeValue = child.nodeValue.split('\n').join(' ');

Modifying this line to join('') solved the problem, and removed the leading spaces, and seemed to cause no other issues (for me at least). But it did cause some tests to fail - it is possible that there could be textnodes across multiple lines that would need to be joined with a space.

My solution to this was to simply remove the space at the start of any line in the parse markdown that started with a single space. I don't think there would be any other occasions this would happen - it would only be following a linebreak.

Adding this on line 393 of converter.js

mdDoc.split(/\n/).map(function (line) {
    // Remove the space the start of any line that only starts with one and only one space
    return line.replace(/^\s{1}\S+/, '');
}).join('\n');

This feels a bit of a hack, but it works. All the tests pass, and adding a testcase for the starting html and expected markdown passes too. Maybe there's a more elegant solution?

Is there any way to get <br> handling published? It would be really helpful.

I can't quite work out the status of showdown - as I mentioned some of this 'fix' is already in 'develop' and 'master' branch. It doesn't seem liek there's a lot of active developemnt. Is there a new release planned at any point? We're actively using this in our app, and would love for stuff like this to work out of the box, rather than having to include workarounds in our app code (or have the overhead of managing a fork). If there's anything I can do to help / contribute to the project then I'm happy to

Thanks a lot, Chris