mcs07 / ChemDataExtractor

Automatically extract chemical information from scientific documents
http://chemdataextractor.org
MIT License
287 stars 112 forks source link

Py2vs3 HTMLParser html.parser #4

Closed pbulsink closed 7 years ago

pbulsink commented 7 years ago

This correctly imports HTMLParser or html.parser for python 2.x or 3.x. For 3.x, imports html.parser as HTMLParser to simplify & reduce further incompatibility. See https://docs.python.org/2/library/htmlparser.html.

mcs07 commented 7 years ago

Nice catch, thanks. Elsewhere I've used six for handling Python 2/3 compatibility, so for consistency I've used six.moves.html_parser instead of your manual version check.

And I just discovered it's a bit more complicated - we are only using the HTMLParser for the unescape method, which is undocumented and apparently deprecated. So for Python 3.4+ we should actually use html.unescape.