suntong / html2md

HTML to Markdown converter
MIT License
203 stars 19 forks source link

GitHubFlavored turns one linebreak into two #15

Closed God-damnit-all closed 1 year ago

God-damnit-all commented 1 year ago

On GitHub's markdown, a single linebreak parsed as a single linebreak. But when running html2md I'm finding that it's turning one <br> tag into two linebreaks in the resulting md file.

suntong commented 1 year ago

Please provide a MRE - Minimal, Reproducible Example

God-damnit-all commented 1 year ago

Please provide a MRE - Minimal, Reproducible Example

echo "foo<br>bar" | html2md -G -i >out.md

What I got:

foo

bar

What I expected:

foo
bar
suntong commented 1 year ago

thx. This looks like an upstream issue, which I've reported for you just now.

God-damnit-all commented 1 year ago

This might be fixable using a rule, as documented here: https://github.com/JohannesKaufmann/html-to-markdown#adding-rules

Maybe until this is fixed, you could have a rule for GitHub-flavored markdown handling <br> with a single linebreak?

Or, alternatively, you could add a new feature to this project that lets us use this rule syntax ourselves by loading a file containing the rule.

suntong commented 1 year ago

For the rule, it has to be compiled in, not loading a source .go file on the fly.

Overall, this is an upstream issue and should be fixed upstream instead.

Or, alternatively, you could use upstream directly as the cli is only a thin wrapper for the upstream.

God-damnit-all commented 1 year ago

Overall, this is an upstream issue and should be fixed upstream instead.

It looks like that isn't going to happen any time soon:

JohannesKaufmann/html-to-markdown #40 (comment)

If we want to be extra precise, the html-to-markdown library would need to also support hard line breaks. However that would require some other changes.

So for now, the current behaviour is going to stay as it is. Changing it would break it for other implementations. However you are free to change the behaviour, by writing a very simple custom rule.

Any chance of getting a parameter for this?

suntong commented 1 year ago

Closing it as it is the "expected behavior" of upstream.

This is expected behavior. A line break in Markdown requires two newline characters. A single newline character will not render as a line break, instead it will render as a space.

Originally posted by @wcalandro in https://github.com/JohannesKaufmann/html-to-markdown/issues/40#issuecomment-1012624554

Any chance of getting a parameter for this?

adding parameters won't fix the problem; unless it is fixed upstream.

For further communicating about this issue, please conduct at upstream https://github.com/JohannesKaufmann/html-to-markdown/issues/40

God-damnit-all commented 1 year ago

adding parameters won't fix the problem; unless it is fixed upstream.

I don't suppose it would help if you could consider it a feature request instead, for a parameter for this one particular rule?

suntong commented 1 year ago

For further communicating about this issue, please conduct at upstream JohannesKaufmann/html-to-markdown#40

IE, take a look my comment -- https://github.com/JohannesKaufmann/html-to-markdown/issues/40#issuecomment-1542913832

God-damnit-all commented 1 year ago

For further communicating about this issue, please conduct at upstream JohannesKaufmann/html-to-markdown#40

IE, take a look my comment -- JohannesKaufmann/html-to-markdown#40 (comment)

Sorry.

suntong commented 1 year ago

it's alright,