trailofbits / graphtage

A semantic diff utility and library for tree-like files such as JSON, JSON5, XML, HTML, YAML, and CSV.
GNU Lesser General Public License v3.0
2.38k stars 45 forks source link

Expects quoted attributes in HTML unnecessarily, preventing diffing #25

Open technopagan opened 4 years ago

technopagan commented 4 years ago

When trying to diff an HTML file, Graphtage throws an error when encountering e.g.:

<meta name=foo

because it expects quoted attributes like so:

<meta name="foo"

However, the first variant without quotes is perfectly valid HTML and should not prevent Graphtage from diffing the file.

ESultanik commented 4 years ago

This is because Graphtage is (incorrectly) currently using Python's XML parser to parse HTML. This is a bug. We need to switch to an actual HTML parser. I will use this issue to track that development.