mixmark-io / turndown

🛏 An HTML to Markdown converter written in JavaScript
https://mixmark-io.github.io/turndown
MIT License
8.52k stars 864 forks source link

Why are non-breaking spaces returned when regular spaces are given? #452

Closed dchacke closed 6 months ago

dchacke commented 7 months ago

Why is

<p>foo <em>bar</em></p>

converted to

foo *bar*

?

(Note that after 'foo' that's not a regular space but a non-breaking space, rendered in text editors as <0xa0>.)

This behavior breaks applications that rely on consistency between spaces given and spaces returned.

Shouldn't a regular space be produced since a regular space was given?

I've skimmed issues related to non-breaking spaces as well as your Wiki article on whitespace and the CommonMark spec, specifically the section on delimiter runs. Maybe I'm missing something but I haven't found an explanation for why this behavior would be expected/needed.

If it is needed, is there a way to override it?

martincizek commented 6 months ago

I've just tried it and it works as expected, i.e. single &nbsp is converted to 0xa0 and regular space is converted to 0x20.

Tried with https://mixmark-io.github.io/turndown/, even on your input. Can you please double-check it and eventually write a piece of code that reproduces the issue?

dchacke commented 6 months ago

https://jsbin.com/ratasumuti/edit?html,js,output

dchacke commented 6 months ago

Disregard, this is added by macOS independently of turndown.

EDIT: For those curious why this happens, see this explanation.