pypa / readme_renderer

Safely render long_description/README files in Warehouse
Apache License 2.0
158 stars 88 forks source link

Docutils xform settings are doing the opposite of what you want #170

Closed jwodder closed 2 years ago

jwodder commented 4 years ago

In the reStructuredText rendering module (readme_renderer/rst.py), the settings dict for configuring Docutils currently contains the lines:

    # Prevent a lone top level heading from being promoted to document
    # title, and thus second level headings from being promoted to top
    # level.
    "doctitle_xform": True,

    # Prevent a lone subsection heading from being promoted to section
    # title, and thus second level headings from being promoted to top
    # level.
    "sectsubtitle_xform": True,

However, assuming the comments accurately reflect what you the developers intend for the code to do, these two options are currently the wrong value — they should both be False!

Let's look at a specific example:

==============
Document Title
==============

Lorem ipsum yadda yadda yadda.

Section Title
=============

Section Subtitle
----------------

Here be goblins.

Converting this to HTML with python3 -m readme_renderer (Python v3.8.3, readme_renderer v26.0) yields:

<p>Lorem ipsum yadda yadda yadda.</p>
<div id="section-title">
<h2>Section Title</h2>
<h2 id="section-subtitle"><span class="section-subtitle">Section Subtitle</span></h2>
<p>Here be goblins.</p>
</div>

Note that, because doctitle_xform is True, Docutils has promoted the top section title ("Document Title") to the document title, which is not included in parts["fragment"] (cf. Docutils publisher docs) and thus not included in the output from readme_renderer. I highly doubt that deleting lone top-level section titles from READMEs is what you want to do.

As for sectsubtitle_xform, the commit introducing it says that it is "like GitHub". I don't recall how GitHub was handling such things in early 2015, but if you publish the above rST document to GitHub now (say, in a Gist with a .rst file extension) and look at the source for the rendered page, you'll see that "Section Title" and "Section Subtitle" are at different H levels (here <h2> and <h3>, respectively) with no <span class="section-subtitle"> or equivalent. Setting the option to False will make it match up with GitHub again.

di commented 4 years ago

I highly doubt that deleting lone top-level section titles from READMEs is what you want to do.

I didn't write this part of readme_renderer, but I believe this is exactly what the goal is. We don't want to have PyPI's header for the description immediately followed by another header.

For example, compare pip's PyPI description with it's corresponding README on GitHub. What we don't want is PyPI's "Description" header followed immediately by "pip - The Python Package Installer".

miketheman commented 2 years ago

Circling back, after reviewing open issues.

The behavior of the rst renderer is indeed intentional, as noted in https://github.com/pypa/readme_renderer/issues/149#issuecomment-1066119816

As the primary use case for this library is the PyPI warehouse, it currently works as designed.

I highly doubt that deleting lone top-level section titles from READMEs is what you want

We now emit a warning when there's only a single top-level section - see #231

Going to close this one out - but if you see value in having the library change behavior, a pull request to change the behavior dynamically on input (preserving the current default) could be an interesting!