python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

_markupbase.py fails with TypeError on invalid keyword in marked section #81928

Open 2b51acbb-0c44-4aa9-aaab-47e6b26016ea opened 5 years ago

2b51acbb-0c44-4aa9-aaab-47e6b26016ea commented 5 years ago
BPO 37747
Nosy @ezio-melotti, @berkerpeksag, @bp256r1, @leonardr
Files
  • test_issue37747.py: Reproduce issue 37747 without using external packages
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.8', '3.7', 'library'] title = '_markupbase.py fails with TypeError on invalid keyword in marked section' updated_at = user = 'https://github.com/bp256r1' ``` bugs.python.org fields: ```python activity = actor = 'leonardr' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'bp256r1' dependencies = [] files = ['49226'] hgrepos = [] issue_num = 37747 keywords = [] message_count = 2.0 messages = ['348910', '371323'] nosy_count = 5.0 nosy_names = ['ezio.melotti', 'berker.peksag', 'kodial', 'bp256r1', 'leonardr'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = None url = 'https://bugs.python.org/issue37747' versions = ['Python 3.7', 'Python 3.8'] ```

    2b51acbb-0c44-4aa9-aaab-47e6b26016ea commented 5 years ago

    Hello,

    I'm not sure if this a bug, but I noticed that a TypeError is raised by the parse_marked_section function of the _markupbase module in Python 3.7.4 when attempting to parse a name token of \<![\r�N&=\x00%\x1a\x1e��;u�dWf'.

    See:

    Steps to reproduce:

    $ pip3 freeze | grep beautifulsoup4
    beautifulsoup4==4.6.3
    $ python3
    >>> a='<![\r�N&=\x00%\x1a\x1e��;u�dWf'
    >>> from bs4 import BeautifulSoup
    >>> BeautifulSoup(a, 'html.parser')
    /usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py:78: UserWarning: expected name token at '<![\r�N&=\x00%\x1a\x1e��;u�dWf'
      warnings.warn(msg)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 303, in __init__
        self._feed()
      File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 364, in _feed
        self.builder.feed(self.markup)
      File "/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py", line 250, in feed
        parser.feed(markup)
      File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 111, in feed
        self.goahead(0)
      File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 179, in goahead
        k = self.parse_html_declaration(i)
      File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
        return self.parse_marked_section(i)
      File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_markupbase.py", line 149, in parse_marked_section
        sectName, j = self._scan_name( i+3, i )
    TypeError: cannot unpack non-iterable NoneType object

    If it's not a bug, sorry, not sure.

    78ccdfc6-b825-418a-bb62-83eef5c71397 commented 4 years ago

    This was also recently filed as a bug against Beautiful Soup, a package I maintain, using Python 3.8. (https://bugs.launchpad.net/beautifulsoup/+bug/1883104)

    The attached script reproduces the problem without using external packages.