Closed simonw closed 1 year ago
Here's the YAML:
https://github.com/simonw/datasette.io/blob/b821fb19eda08b4942183db507bb6f986f8134bf/news.yaml#L3
The markdown is stored in the DB:
https://datasette.io/content/news?_sort=rowid&date__exact=2023-01-13
And rendered here:
So it looks like datasette-render-markdown
is the thing that renders &
as &
in this context.
This tool compares different markdown implementations:
It suggests that python-markdown
renders this just fine:
I think this is likely a bug in the interaction between the markdown rendering and Bleach in this plugin:
I can recreate it locally like this:
>>> from datasette_render_markdown import render_markdown
>>> render_markdown('[this & that](https://www.example.com/)')
Markup('<div style="white-space: normal"><p><a href="https://www.example.com/" rel="nofollow">this &amp; that</a></p></div>')
Note this &amp; that
in the output.
Confirmed: I removed the calls to bleach and got this:
<div style="white-space: normal"><p><a href="https://www.example.com/">this & that</a></p></div>
After more exploration, it turns out it's the call to bleach.linkify(...)
that causes the double escaping of the ampersand.
https://bleach.readthedocs.io/en/latest/linkify.html says:
If you plan to sanitize/clean the text and linkify it, you should do that in a single pass using LinkifyFilter. This is faster and it'll use the list of allowed tags from clean.
Deployed that fix to https://datasette.io/