html5lib / html5lib-python

Standards-compliant library for parsing and serializing HTML documents and fragments in Python
MIT License
1.12k stars 283 forks source link

how to use with defusedxml builders #266

Open graingert opened 8 years ago

graingert commented 8 years ago

defusedxml is a library that provides safe defaults for xml library configuration.

It would be useful to include documentation on how to use html5lib with a parser that has been configured with defusedxml.

Also is this even necessary? Does html5lib already take the precautions listed in defusedxml?

see also https://github.com/mozilla/bleach/issues/158

gsnedders commented 8 years ago

Also is this even necessary? Does html5lib already take the precautions listed in defusedxml?

They simply aren't applicable to HTML as a language.

The vectors defusedxml deals with:

So none of those apply.

The "Other things to consider" include:

So essentially the only things that can affect HTML are hash collision attacks (needs to be defeated by the Python implementation, which it is in CPython), decompression bombs (because they can affect anything using compression), and XPath and XSLT which are really matters that you need to worry about algorithmic complexity with any code processing an arbitrary tree.

gsnedders commented 8 years ago

So no, there's nothing to do here. Maybe we should document this somewhere.