martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON
MIT License
5.49k stars 462 forks source link

Improve security + better support for newer Python versions #321

Closed litios closed 2 weeks ago

litios commented 1 year ago

Hello,

Ubuntu is working to include xmltodict in our next release. As part of the "main inclusion review" (MIR) process to add software to our main package repository, we perform security audits. You can find the full xmltodict MIR bug report here

While auditing it, we found that the library is vulnerable to typical XML attacks due to relying on expat as defusedexpat doesn't have support since 2016 and only up to Python 3.4.

In newer versions of expat these issues seem to be mitigated:

Expat 2.4.1 and newer is not vulnerable to the “billion laughs” and “quadratic blowup” vulnerabilities. Items still listed as vulnerable due to potential reliance on system-provided libraries. Check pyexpat.EXPAT_VERSION.

but it still affects not so older python versions like 3.8

As recommended by the documentation, there is currently a supported alternative called defusedxml owned by the same maintainer that owned defusedexpat.

It would be very beneficial to replace the old defusedexpat with defusedxml. Does that sound like something that could be valuable to the library and the users?

Thanks in advance!

Murplugg commented 1 year ago

Would love to see that.

martinblech commented 1 year ago

@litios this sounds indeed like a valuable contribution for the library and the users. I would be more than happy to review and approve such PR.

hannob commented 1 year ago

I would prefer some clarification here, as I want to use xmltodict for a project where security against malicious input is relevant. I found this bug, as well as another bugs indicating unspecified security concerns #279.

However the report above is unclear what exact security issues it is talking about.

To my knowledge there are generally two classes of XML-specific vulnerabilities to be concerned about: DoS issues and include issues.

As for DoS issues, there are the entity expansion issues that create a lot of data, which includes issues like billion laughs or quadratic blowup, as listed in Python's list. However this is mitigated in expat since a while. The somewhat different hashdos issue is also mitigated in expat via the use of siphash and a random salt.

Then there are the include issues. Via DTD, entity expansion or the xinclude feature one may access local or external files, which could be used for a variety of attacks like data exfiltration or SSRF. However all three features are not enabled by default in the etree API, and in a test of mine none of these issues was exploitable in xmltodict.

According to Python's doc external entity expansion was disabled in various APIs in Python 3.7.1. Python 3.6 is already out of support and python 3.7 will be out of support soon. It is unclear to me where the claim that "not so older python versions like 3.8" are affected by any of this comes from.

In summary it is not clear to me which security issues exist, except maybe with extremely outdated python or expat versions with known problems.

It would probably be good to remove the use of the obsolete defusedexpat import, but it is not clear to me whether defusedxml is still needed.

litios commented 1 year ago

The main idea was to point out that, right now, vulnerable Python versions won't have a safe alternative for untrusted input to known attacks. If running modern patched Python versions, you should be safe with expat.

Python 3.8, for example, is no longer affected since v3.8.12 but older 3.8 versions will be affected. Same for Python 3.7. And Python3.8 won't be out of support until the end of 2024.

And I agree that you will still need to enable allowing entities for attacks like billion laughs to work. I simply wanted to point out that defusedxml could be a good alternative to the old defusexpat meanwhile there are vulnerable Python expat versions out there.

litios commented 1 year ago

@martinblech I don't have the bandwidth right now to provide a PR for this. But I still wanted to raise the issue, just in case someone else wants to implement it in the meantime.

hartwork commented 1 year ago

Python 3.8, for example, is no longer affected since v3.8.12 but older 3.8 versions will be affected. Same for Python 3.7. And Python3.8 won't be out of support until the end of 2024.

@litios I think there may be a misunderstanding here, let's find out and be sure. To my understanding Python 3.8 being supported means that that the CPython project continues to do new 3.8.x releases with security fixes and that users can expected to be safe and receive updates if they run the latest 3.8.x version at any point in time, e.g. version 3.8.16 as of today. If someone is running say 3.8.8 today, they are using a supported track of the language but cannot expect to be secure. As a result, users running vanilla 3.8.x <=3.8.15 (meaning without backports) need to update their CPython and do not need any particular concern here. Does that sound correct?

litios commented 1 year ago

@hartwork yes, I totally agree. I'm sorry if what I wrote was unclear. Thanks for clarifying it!

hannob commented 1 year ago

Just to summarize this issue, as I think this can confuse others who might find this report: There is no security problem with currently supported versions of expat and python.

The security issues that were mitigated by defusedexpat are issues in older, unsupported versions of expat and python. I have therefore created a PR to remove the obsolete use of defusedexpat.

hannob commented 2 weeks ago

I think this can be closed now.