kovidgoyal / html5-parser

Fast C based HTML 5 parsing for python
Apache License 2.0
678 stars 33 forks source link

bs4 4.8.0 causes html5-parser to break #20

Closed eli-schwartz closed 5 years ago

eli-schwartz commented 5 years ago

(Discovered because calibre's test suite started bombing out on test_comments_to_html, which exercises html5-parser.)

==> Starting check()...
running test
running build_py
running build_ext

running tests...
..E................EEEEEs
======================================================================
ERROR: test_soup (test.adapt.AdaptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html5-parser/src/html5-parser-0.4.7/test/adapt.py", line 88, in test_soup
    self.do_soup_test(soup_name)
  File "/build/python-html5-parser/src/html5-parser-0.4.7/test/adapt.py", line 92, in do_soup_test
    root = parse(HTML, treebuilder='soup')
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/__init__.py", line 190, in parse
    data, return_root=return_root, keep_doctype=keep_doctype, stack_size=stack_size)
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 128, in parse
    bs, soup, new_tag, Comment, append, NavigableString = init_soup()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 122, in init_soup
    init_bs4_cdata_list_attributes()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 18, in init_bs4_cdata_list_attributes
    k: frozenset(v) for k, v in HTMLTreeBuilder.cdata_list_attributes.items()
AttributeError: type object 'HTMLTreeBuilder' has no attribute 'cdata_list_attributes'

======================================================================
ERROR: test_attr_soup (test.soup.SoupTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html5-parser/src/html5-parser-0.4.7/test/soup.py", line 35, in test_attr_soup
    root = parse('<p a=1 b=2 ID=3><a a=a>')
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 128, in parse
    bs, soup, new_tag, Comment, append, NavigableString = init_soup()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 122, in init_soup
    init_bs4_cdata_list_attributes()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 18, in init_bs4_cdata_list_attributes
    k: frozenset(v) for k, v in HTMLTreeBuilder.cdata_list_attributes.items()
AttributeError: type object 'HTMLTreeBuilder' has no attribute 'cdata_list_attributes'

======================================================================
ERROR: test_doctype_stays_intact (test.soup.SoupTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html5-parser/src/html5-parser-0.4.7/test/soup.py", line 80, in test_doctype_stays_intact
    soup = parse(dt + base, return_root=False, keep_doctype=True)
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 128, in parse
    bs, soup, new_tag, Comment, append, NavigableString = init_soup()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 122, in init_soup
    init_bs4_cdata_list_attributes()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 18, in init_bs4_cdata_list_attributes
    k: frozenset(v) for k, v in HTMLTreeBuilder.cdata_list_attributes.items()
AttributeError: type object 'HTMLTreeBuilder' has no attribute 'cdata_list_attributes'

======================================================================
ERROR: test_simple_soup (test.soup.SoupTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html5-parser/src/html5-parser-0.4.7/test/soup.py", line 22, in test_simple_soup
    root = parse('<p>\n<a>y</a>z<x:x>1</x:x>')
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 128, in parse
    bs, soup, new_tag, Comment, append, NavigableString = init_soup()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 122, in init_soup
    init_bs4_cdata_list_attributes()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 18, in init_bs4_cdata_list_attributes
    k: frozenset(v) for k, v in HTMLTreeBuilder.cdata_list_attributes.items()
AttributeError: type object 'HTMLTreeBuilder' has no attribute 'cdata_list_attributes'

======================================================================
ERROR: test_soup_leak (test.soup.SoupTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html5-parser/src/html5-parser-0.4.7/test/soup.py", line 59, in test_soup_leak
    parse(HTML)  # So that BS and html_parser set up any internal objects
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 128, in parse
    bs, soup, new_tag, Comment, append, NavigableString = init_soup()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 122, in init_soup
    init_bs4_cdata_list_attributes()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 18, in init_bs4_cdata_list_attributes
    k: frozenset(v) for k, v in HTMLTreeBuilder.cdata_list_attributes.items()
AttributeError: type object 'HTMLTreeBuilder' has no attribute 'cdata_list_attributes'

======================================================================
ERROR: test_soup_list_attrs (test.soup.SoupTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/build/python-html5-parser/src/html5-parser-0.4.7/test/soup.py", line 54, in test_soup_list_attrs
    root = parse('<a class="a b" rel="x y">')
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 128, in parse
    bs, soup, new_tag, Comment, append, NavigableString = init_soup()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 122, in init_soup
    init_bs4_cdata_list_attributes()
  File "/build/python-html5-parser/src/html5-parser-0.4.7/build/lib.linux-x86_64-3.7/html5_parser/soup.py", line 18, in init_bs4_cdata_list_attributes
    k: frozenset(v) for k, v in HTMLTreeBuilder.cdata_list_attributes.items()
AttributeError: type object 'HTMLTreeBuilder' has no attribute 'cdata_list_attributes'

----------------------------------------------------------------------
Ran 25 tests in 0.147s

FAILED (errors=6, skipped=1)
==> ERROR: A failure occurred in check().

This was broken in https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/revision/502#bs4/builder/__init__.py

The cdata_list_attributes is now renamed to DEFAULT_CDATA_LIST_ATTRIBUTES and then self.cdata_list_attributes is set during __init__.

kovidgoyal commented 5 years ago

This is already fixed in master.

eli-schwartz commented 5 years ago

Hmm, sneaky. :D

committed 1 hour ago

It wasn't there when I first checked. :(

kovidgoyal commented 5 years ago

I had just forgotten to push it, anyway I have released v0.4.8 with the fix.

eli-schwartz commented 5 years ago

Thanks, I'm going to quickly build this and release to Arch Linux before I go to bed. :)

EDIT: done