kovidgoyal / html5-parser

Fast C based HTML 5 parsing for python
Apache License 2.0
682 stars 35 forks source link

Clean up doctype handling in Beautiful Soup #19

Closed Mr0grog closed 5 years ago

Mr0grog commented 5 years ago

When parsing with treebuilder='soup', we were sending incorrect arguments to the Beautiful Soup Doctype constructor, which resulted in poor results if you tried to serialize the resulting soup later. You’d wind up with the original doctype declaration nested inside another declaration, like so:

<!DOCTYPE !<DOCTYPE>>

This adds some tests around the issue and slightly modifies the code so the right arguments are used. Fixes #18.

Mr0grog commented 5 years ago

FYI, looks like the Travis error was a networking problem, not an actual test failure.

kovidgoyal commented 5 years ago

Since the static method has been around forever, I think we should just use it. I guess I missed it while implementing the soup adapter.

Mr0grog commented 5 years ago

Done!

kovidgoyal commented 5 years ago

Thanks, merged.

Mr0grog commented 5 years ago

@kovidgoyal are you planning on publishing a release with this anytime soon? It would be super helpful if I could depend on the package from PyPI instead of pointing to a GitHub commit URL :)

kovidgoyal commented 5 years ago

done

Mr0grog commented 5 years ago

🎉 Thank you!