nasa-gcn / pygcn

Python package for processing Gamma-ray Coordinates Network (GCN) notices
https://pypi.org/project/pygcn/
GNU General Public License v3.0
46 stars 23 forks source link

XML security #22

Open manning-ncsa opened 3 years ago

manning-ncsa commented 3 years ago

First of all, thanks for creating and sharing this Python package. While exploring the code and beginning to write our own handler function, I noticed that according to the Python docs, there are XML vulnerabilities that should probably be secured using something like defusedxml. Have you considered using this package to parse the incoming packets?

lpsinger commented 3 years ago

We use lxml, not the Python standard library's parser. But I do see in the defusedxml README that it has some vulnerabilities. We do pass the lxml.etree document root to the user's callback function, so switching to a different parser would be an API change.

Which of the mitigations in defusedxml can we apply without switching parsers? Which is lxml permanently vulnerable to?

manning-ncsa commented 3 years ago

Unfortunately I do not have experience nor expertise in XML parsing and vulnerability mitigation. I do see that the defusedxml.lxml module is deprecated and is intended as an example of how you can mitigate certain vulnerabilities:

DEPRECATED The module is deprecated and will be removed in a future release.

The module acts as an example how you could protect code that uses lxml.etree. It implements a custom Element class that filters out Entity instances, a custom parser factory and a thread local storage for parser instances. It also has a check_docinfo() function which inspects a tree for internal or external DTDs and entity declarations. In order to check for entities lxml > 3.0 is required.

parse(), fromstring() RestrictedElement, GlobalParserTLS, getDefaultParser(), check_docinfo()

The relevant code seems to revolve around this function: check_docinfo()

I assume the actual risk is low, considering that the organizations generating the VOEvent notices are unlikely to send malicious XML documents, but perhaps someone that has the time and energy to tackle this will see this issue and make it more secure.

lpsinger commented 3 years ago

I assume the actual risk is low, considering that the organizations generating the VOEvent notices are unlikely to send malicious XML documents, but perhaps someone that has the time and energy to tackle this will see this issue and make it more secure.

The problem is that GCN has no protection against MITM attacks, because it does not have any message authentication. The network traffic is sent in the clear and without signatures of any kind. An attacker could inject maliciously crafted VOEvent packets.

lpsinger commented 3 years ago

@manning-ncsa, would you like to submit a PR to implement the mitigation described in that check_docinfo() function?

manning-ncsa commented 3 years ago

The problem is that GCN has no protection against MITM attacks, because it does not have any message authentication. The network traffic is sent in the clear and without signatures of any kind. An attacker could inject maliciously crafted VOEvent packets.

:astonished:

@manning-ncsa, would you like to submit a PR to implement the mitigation described in that check_docinfo() function?

I suppose in light of your previous comment I have more of a self-interest in doing so. If I do implement anything like that, I will definitely submit a PR.