jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.57k stars 3.38k forks source link

[markdown writer]: reverse smart extension does not work for non-breaking space #7870

Open ickc opened 2 years ago

ickc commented 2 years ago

With pandoc 2.17.0.1 on macOS,

MWE:

$ echo 'Mr. A--B' | pandoc
<p>Mr. A–B</p>
$ echo 'Mr. A--B' | pandoc -t markdown
Mr. A--B
$ echo 'Mr. A--B' | pandoc -t native
[ Para [ Str "Mr.\160A\8211B" ] ]

Note that the output of echo 'Mr. A--B' | pandoc -t markdown has a non-breaking space in Mr. A--B.

From doc:

Interpret straight quotes as curly quotes, --- as em-dashes, -- as en-dashes, and ... as ellipses. Nonbreaking spaces are inserted after certain abbreviations, such as "Mr." ... Note: If you are writing Markdown, then the smart extension has the reverse effect: what would have been curly quotes comes out straight.

So the expected output should be Mr. A--B with a normal space.

This problem manifested when using pandoc as a formatter as shown from the example.

Frankly I'm not sure if this is fixable as the AST lost the information already, and to un-smart non-breaking space may be too counter-intuitive? So may be just document this behavior?

jgm commented 2 years ago

In theory we could look for the pattern WORD + PERIOD + NBSP and look up the word in the abbreviation table. If it's found, we could replace the NBSP with a regular space.

The problem is that the abbreviation table is stored in reader options, and we don't have access to that in the writer. So this would not be a simple change.