Open 2b51acbb-0c44-4aa9-aaab-47e6b26016ea opened 5 years ago
Hello,
I'm not sure if this a bug, but I noticed that a TypeError is raised by the parse_marked_section function of the _markupbase module in Python 3.7.4 when attempting to parse a name token of \<![\r�N&=\x00%\x1a\x1e��;u�dWf'.
See:
Steps to reproduce:
$ pip3 freeze | grep beautifulsoup4
beautifulsoup4==4.6.3
$ python3
>>> a='<![\r�N&=\x00%\x1a\x1e��;u�dWf'
>>> from bs4 import BeautifulSoup
>>> BeautifulSoup(a, 'html.parser')
/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py:78: UserWarning: expected name token at '<![\r�N&=\x00%\x1a\x1e��;u�dWf'
warnings.warn(msg)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 303, in __init__
self._feed()
File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 364, in _feed
self.builder.feed(self.markup)
File "/usr/local/lib/python3.7/site-packages/bs4/builder/_htmlparser.py", line 250, in feed
parser.feed(markup)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 111, in feed
self.goahead(0)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 179, in goahead
k = self.parse_html_declaration(i)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/html/parser.py", line 264, in parse_html_declaration
return self.parse_marked_section(i)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_markupbase.py", line 149, in parse_marked_section
sectName, j = self._scan_name( i+3, i )
TypeError: cannot unpack non-iterable NoneType object
If it's not a bug, sorry, not sure.
This was also recently filed as a bug against Beautiful Soup, a package I maintain, using Python 3.8. (https://bugs.launchpad.net/beautifulsoup/+bug/1883104)
The attached script reproduces the problem without using external packages.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['3.8', '3.7', 'library']
title = '_markupbase.py fails with TypeError on invalid keyword in marked section'
updated_at =
user = 'https://github.com/bp256r1'
```
bugs.python.org fields:
```python
activity =
actor = 'leonardr'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'bp256r1'
dependencies = []
files = ['49226']
hgrepos = []
issue_num = 37747
keywords = []
message_count = 2.0
messages = ['348910', '371323']
nosy_count = 5.0
nosy_names = ['ezio.melotti', 'berker.peksag', 'kodial', 'bp256r1', 'leonardr']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue37747'
versions = ['Python 3.7', 'Python 3.8']
```