unitedstates / congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Creative Commons Zero v1.0 Universal
2.03k stars 502 forks source link

Use Safe Parsers in `lxml` Parsing Functions #921

Open pixeeai opened 4 months ago

pixeeai commented 4 months ago

This codemod sets the parser parameter in calls to lxml.etree.parse and lxml.etree.fromstring if omitted or set to None (the default value). Unfortunately, the default parser=None means lxml will rely on an unsafe parser, making your code potentially vulnerable to entity expansion attacks and external entity (XXE) attacks.

The changes look as follows:

  import lxml.etree
- lxml.etree.parse("path_to_file")
- lxml.etree.fromstring("xml_str")
+ lxml.etree.parse("path_to_file", parser=lxml.etree.XMLParser(resolve_entities=False))
+ lxml.etree.fromstring("xml_str", parser=lxml.etree.XMLParser(resolve_entities=False))
More reading * [https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XMLParser](https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XMLParser) * [https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing](https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing) * [https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html](https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html)

Powered by: pixeebot (codemod ID: pixee:python/safe-lxml-parsing)

pixeeai commented 4 months ago

FYI - This change was autogenerated from a GitHub app - called Pixeebot. A code-quality GitHub App; like Dependabot, but for source code. Feel free to check it our for more details for how you can install it onto your project's repo for continued code hardening and code security recommendations.