trentm / python-markdown2

markdown2: A fast and complete implementation of Markdown in Python
Other
2.66k stars 433 forks source link

AssertionError: `cuddled_list.startswith("<ul>") or cuddled_list.startswith("<ol>")` #471

Closed mhils closed 2 years ago

mhils commented 2 years ago

The following code errors on master:

import markdown2

text = """
This command performs an entire mail transaction.

The arguments are:
    - from_addr    : The address sending this mail.
    - to_addrs     : A list of addresses to send this mail to.  A bare
                     string will be treated as a list with 1 address.
    - msg          : The message to send.
    - mail_options : List of ESMTP options (such as 8bitmime) for the
                     mail command.
    - rcpt_options : List of ESMTP options (such as DSN commands) for
                     all the rcpt commands.

msg may be a string containing characters in the ASCII range, or a byte
string.  A string is encoded to bytes using the ascii codec, and lone
\r and \n characters are converted to \r\n characters.

If there has been no previous EHLO or HELO command this session, this
method tries ESMTP EHLO first.  If the server does ESMTP, message size
and each of the specified options will be passed to it.  If EHLO
fails, HELO will be tried and ESMTP options suppressed.

This method will return normally if the mail is accepted for at least
one recipient.  It returns a dictionary, with one entry for each
recipient that was refused.  Each entry contains a tuple of the SMTP
error code and the accompanying error message sent by the server.

This method may raise the following exceptions:

 SMTPHeloError          The server didn't reply properly to
                        the helo greeting.
 SMTPRecipientsRefused  The server rejected ALL recipients
                        (no mail was sent).
 SMTPSenderRefused      The server didn't accept the from_addr.
 SMTPDataError          The server replied with an unexpected
                        error code (other than a refusal of
                        a recipient).
 SMTPNotSupportedError  The mail_options parameter includes 'SMTPUTF8'
                        but the SMTPUTF8 extension is not supported by
                        the server.

Note: the connection will be open even after an exception is raised.

Example:

 >>> import smtplib
 >>> s=smtplib.SMTP("localhost")
 >>> tolist=["one@one.org","two@two.org","three@three.org","four@four.org"]
 >>> msg = '''\
 ... From: Me@my.org
 ... Subject: testin'...
 ...
 ... This is a test '''
 >>> s.sendmail("me@my.org",tolist,msg)
 { "three@three.org" : ( 550 ,"User unknown" ) }
 >>> s.quit()

In the above example, the message was accepted for delivery to three
of the four addresses, and one was rejected, with the error code
550.  If all addresses are accepted, then the method will return an
empty dictionary.
"""

markdown2.markdown(text, extras=["cuddled-lists"])
Traceback (most recent call last):
  File "/mnt/c/Users/user/git/python-markdown2/repro.py", line 66, in <module>
    markdown2.markdown(text, extras=["cuddled-lists"])
  File "/mnt/c/Users/user/git/python-markdown2/lib/markdown2.py", line 169, in markdown
    use_file_vars=use_file_vars, cli=cli).convert(text)
  File "/mnt/c/Users/user/git/python-markdown2/lib/markdown2.py", line 378, in convert
    text = self._run_block_gamut(text)
  File "/mnt/c/Users/user/git/python-markdown2/lib/markdown2.py", line 1046, in _run_block_gamut
    text = self._form_paragraphs(text)
  File "/mnt/c/Users/user/git/python-markdown2/lib/markdown2.py", line 2275, in _form_paragraphs
    assert cuddled_list.startswith("<ul>") or cuddled_list.startswith("<ol>")
AssertionError

git bisect identifies c3d4e41620ca6cacb98e56ee87e31df02912be57 as the first commit (/cc @Crozzers). I'm unable to look into this more closely at the moment, but I figured I'll quickly lodge it. Thank you all for pushing markdown2 forward! :)

Crozzers commented 2 years ago

The AssertionError makes sense. Commit c3d4e41 added support for ordered lists that don't start at 0, allowing ol tags to end up like <ol start="[number]">, which would break an assertion that the list startswith <ol>. Perhaps replacing the assertion with something like:

assert re.match(r'^<(?:ul|ol).*?>', cuddled_list)

However, I am struggling to get your example to be parsed correctly. I assume the list of arguments is indeed meant to be a list but I cannot get markdown2 to produce a list with or without the cuddled-lists extra. I even went to commit ac5e7b9 (before c3d4e41 was merged) and no luck there either

mhils commented 2 years ago

This could very well be true. This issue turned up as part of our pdoc smoke tests where we process Python's stdlib and only make sure that it doesn't crash. I don't think it's meant to be an example of correct markdown, it's just something we observed in the wild.

Crozzers commented 2 years ago

Righto, that makes a lot more sense.