readthedocs / recommonmark

A markdown parser for docutils
https://recommonmark.readthedocs.io/
MIT License
340 stars 252 forks source link

Incompatibility with Docutils smart quotes (elision) #151

Closed jfbu closed 2 years ago

jfbu commented 5 years ago

Consider this source eau1.md

# Markdown

l'eau

Inserting this in a Sphinx project with language='fr' and in conf.py

extensions = [
    'recommonmark',
]

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = {
    '.rst': 'restructuredtext',
    '.txt': 'markdown',
    '.md': 'markdown',
}

and executing for example make pseudoxml one obtains

<document source="/Users/xxxxxxxxx/sphinxtests/6282markdownsmartquotes/eau1.md">
    <section ids="markdown" names="markdown">
        <title>
            Markdown
        <paragraph>
            l
            ”
            eau

whereas a similar .rst document is transformed into

<document source="/Users/xxxxxxxxx/sphinxtests/6282markdownsmartquotes/eau2.rst">
    <section ids="rest" names="rest">
        <title>
            reST
        <paragraph>
            l’eau

In the former case l'eau gives three text elements and the Docutils smart quotes transforme acted erroneously, not recognizing a case of elision. In the latter case there is only on text element, and Docutils smart quotes acted correctly.

Notice that smart quotes is default for Sphinx html builder.

For a better description see this comment to Sphinx issue #6282. I am thus raising the issue here :)

gvcgael commented 4 years ago

Did you find any workaround ?

gmilde commented 3 years ago

Does escaping the apostrophe like l\'eau help?

The Docutils recommonmark wrapper merges adjoining Text nodes:

            for node in document.traverse(nodes.TextElement):
                children = node.children
                i = 0
                while i+1 < len(children):
                    if (isinstance(children[i], nodes.Text)
                        and isinstance(children[i+1], nodes.Text)):
                        children[i] = nodes.Text(children[i]+children.pop(i+1))
                        children[i].parent = node
                    else:
                        i += 1

Maybe this would solve the issue.

Otherwise, disable SmartQuotes. Typographical quotes in the source are passed unchanged.