pypa / readme_renderer

Safely render long_description/README files in Warehouse
Apache License 2.0
158 stars 88 forks source link

preserve lang attribute in <pre> codeblocks #200

Closed nschloe closed 2 years ago

nschloe commented 3 years ago

On PyPI, fenced code blocks from markdown are translated to HTML as

<pre>
...
</pre>

On GitHub, the HTML preserves the language attribute (if given),

<pre lang="python">
...
</pre>

This helps third-party applications, like my Purple Pi, to do fancy stuff. (In the case of Purple Pi to render LaTeX math.) I'd love to see this on PyPI, too.

di commented 3 years ago

Hi! This probably belongs at https://github.com/theacodes/cmarkgfm which we use for rendering markdown to HTML instead.

nschloe commented 3 years ago

Yes, I'll move it there. Thanks for the swift reply!

Edit: https://github.com/theacodes/cmarkgfm/issues/37

nschloe commented 3 years ago

Reopening. cmarkgfm already has the capability of enabling land in <pre> blocks, but it must be enabled with

html = cmarkgfm.markdown_to_html(markdown_text, options=cmarkgfmOptions.CMARK_OPT_GITHUB_PRE_LANG)

(See the linked issue.) Any chance this could make it into warehouse?

di commented 3 years ago

Transferred this to https://github.com/pypa/readme_renderer/

di commented 3 years ago

This would be updated here: https://github.com/pypa/readme_renderer/blob/6a9b282823e033d4344ca519452c49def6d259ec/readme_renderer/markdown.py#L36-L38

Although I'm interested in the outcome of https://github.com/theacodes/cmarkgfm/issues/37, and also confirming that this is the current behavior of GFM.

mbarkhau commented 3 years ago

HI @di,

have you had time to look at the fix of https://github.com/theacodes/cmarkgfm/issues/39 (currently in master)?

https://github.com/theacodes/cmarkgfm/commit/7506af3e38379e5424707f7b26c5c8889b27908d

-        <pre><code class="language-python">print('hello')
+        <pre lang="python"><code>print('hello'
         </code></pre>

I'm concerned this might break lots of existing styles.

di commented 3 years ago

I haven't taken a look, but if it's bringing cmarkgfm more inline with what GFM is actually doing, that sounds good to me.

If it changes the styles significantly, we'll make the necessary changes here (and downstream on PyPI) once we update the pin.

miketheman commented 2 years ago

Since this was opened, #209 was added and released.

The language support appears to continue to be correct, as we also pass the detected language to pygments for highlighting, so the final output doesn't contain the pre block at all.

Recommend closure.

di commented 2 years ago

Is this just happening by default in newer versions of cmarkgfm, and we don't need to pass the cmarkgfmOptions.CMARK_OPT_GITHUB_PRE_LANG option?

I think a test that captures this explicitly would be helpful to ensure we don't regress on this for some reason.

miketheman commented 2 years ago

We some existing test cases that validate the behavior of "markdown source => GFM code fences => pygments => html" today:

I haven't tried removing passing the argument.

di commented 2 years ago

But these tests don't seem to preserve the lang="python" attribute in the pre block (or am I missing something?).

miketheman commented 2 years ago

We expressly don't preserve the lang attribute, since we convert known languages to pygments-highlighted blocks, so the lang attribute always goes away.

miketheman commented 2 years ago

What I'm getting at is if we added the lang back in after pygments has rendered, we'd provide an incorrect result - since the contents of the html are no longer code, rather rendered spans

miketheman commented 2 years ago

and we don't need to pass the cmarkgfmOptions.CMARK_OPT_GITHUB_PRE_LANG option?

We don't pass this today - and git history tells me we never did in this codebase - am I missing something? 😄

di commented 2 years ago

I think that's what OP is requesting and what GitHub does for their pre blocks. At the time this issue was created, cmarkgfm didn't support it, but now it does if we set the option. I think we should as it would allow for OP's use case as well as more closely align us with GitHub.

miketheman commented 2 years ago

@di fair enough - I tried to validate the request that OP had - and confirmed that GitHub's own rendering doesn't preserve a lang attribute either.

Here's a Gist using Markdown, and I've shown the code that goes into it, the rendered result and the HTML I viewed on a Chrome browser. https://gist.github.com/miketheman/ea99f1ef40073220648d0a75cbdd787a

The two language-provided options never emit a lang attribute to the end user.

di commented 2 years ago

Looks like you're right -- seems like this has changed on GitHub's end since the issue was created. Let's wait to get some more details from @nschloe, but given that I think it's likely that we should close this.

@nschloe, can you chime in here?

nschloe commented 2 years ago

fair enough - I tried to validate the request that OP had - and confirmed that GitHub's own rendering doesn't preserve a lang attribute either.

I tried again and found, contrary to your results, that GitHub does preserve the lang attribute. For example, the README.md

```abc
lorem

produces `lang="abc"`.

![screenshot](https://user-images.githubusercontent.com/181628/178622614-35c8a59a-afa8-436d-aa8b-321567a8c58a.png)

See [here](https://github.com/nschloe/pre-lang-test) for a test repo.

(I might be slow to reply since I'm holidays right now.)