Open scottbarnes opened 7 months ago
After a little bit of looking into this, I think I might know exactly what is causing this. Could you assign me?
Yeah, I have the exact cause of this right here.
This is how the patterns we use for markdown are defined in our code. On line 6679, changing the regex to EMPHASIS_RE = r'\*([^\*]+)\*' # *emphasis*
solves the issue on my local version. It seems that this issue occurs whenever the program matches the case where there are no intermediary characters between the two.
If four asterisks are inserted -- the beginning and end of the pattern for bolding, this occurs:
However, by changing the quantifiers involved to +
, mandating one or more characters to be separating the beginning and endpoint, the problem is solved.
Case in point:
This seems to be the case for all patterns of this sort.
Noting that this is the file referenced in this comment, and that this code seems to be a much older version of this library. Also noting the the current version of markdown
has updated regular expressions for markdown tokens, and the change is roughly equivalent to yours, @benbdeitch (it checks for at least one non-asterisk character between the **
).
We have a few other server-side markdown issues that deal with markdown not being parsed as expected. I'm wondering if updating Infogami to use the current markdown
library would close these issues? I noticed that markdown
is included in Infogami's requirements.txt
, and that this library was briefly used in macro.py
, but that change was rolled back.
I'm marking this as https://github.com/internetarchive/openlibrary/labels/State%3A%20Blocked until I can figure out the following:
markdown.py
with the current markdown
library?markdown.py
that would prevent us from using the current version of markdown
?markdown
library cause existing markdown to render in unexpected and undesired ways?
Problem
Having a lone pair of asterisks (
**
) on a line in an Open Library field that supports markdown (such as thedescription
field of anEdition
orWork
) causes the following text for the rest of the page to be in italics.Note: this does not appear to happen with lone asterisks (
*
). Instead, it the Open Library implementation seems to accurately follow GitHub, insofar as nothing happens and there are no italics.Evidence / Screenshot
Relevant URL(s)
Reproducing the bug
**
somewhere on a line and save.**
does not become italics (nor does anything become bold, following GitHub's lead).**
does become italics.Context
Notes from this Issue's Lead
Proposal & constraints
We appear to follow GitHub Flavored markdown, which doesn't appear to support italics or bold spanning lines. The latter case is one likely way people will run into the bug.
However, having unmatched
**
on a line in GitHub (e.g. here in this form) doesn't seem to cause an italicsfest. Instead, it merely does nothing. We should consider following GitHub's behavior.Related files
Stakeholders