kurtmckee / feedparser

Parse feeds in Python
https://feedparser.readthedocs.io
Other
1.96k stars 342 forks source link

opening file mentioned in feed doctype #107

Open johniez opened 7 years ago

johniez commented 7 years ago

I am currently getting an "Unknown IO error" printed to stderr while using feedparser.parse('http://feeds.feedburner.com/news_trailbusterscom?format=xml') It defines a header:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css"?><!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

I have been using strace to see where it is happening and I saw a stat('http://my.netscape.com/publish/formats/rss-0.91.dtd') call for the doctype from the xml header. I tried to parse a feed with doctype changed to file://etc/hosts and strace have disclosed a successful stat() and open() for the file a filled in the doctype url.

This behaviour seems a little bit suspicious to me. Allowing user input to open a file in the system is not much pretty. Is this OK?

twm commented 6 years ago

No, this behavior is not okay and actually it is pretty serious. Perhaps feedparser should use defusedxml, which wraps a number of Python XML libraries to prevent this stuff, and has nice explanations of these vulnerabilities:

kurtmckee commented 6 years ago

@johniez thanks for reporting this!

@twm, great suggestion! I'd like feedparser to be far more stable and secure than it is, so this may be a necessary change to protect users! I'll look into it as soon as I can!

dmoklaf commented 5 months ago

Any update? This looks like a serious vulnerability