jsumners / feedparser

Automatically exported from code.google.com/p/feedparser
Other
0 stars 0 forks source link

icerocket.com is blocking non-browser User-Agents #391

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. import feedparser
2. d = 
feedparser.parse('http://www.icerocket.com/search?tab=twitter&q=chris&rss=1.xml'
)
3. d

What is the expected output? What do you see instead?
Expect to see the makings of a .xml file.

I see:
{'entries': [], 'bozo_exception': SAXParseException('mismatched tag',), 
'headers': {'Content-Location': 'search.php', 'Content-Type': 'text/html; 
charset=UTF-8', 'Set-Cookie': 
'irpref=nw%3A0%3Bth%3A1%3Bop%3A10%3Bsum%3A1%3Bqv%3A1%3Bu%3A1%3Baf%3A1%3Ber%3A0%3
B; expires=Mon, 19-Aug-2013 20:47:50 GMT; path=/; domain=.icerocket.com', 
'Date': 'Wed, 20 Feb 2013 20:47:50 GMT', 'Content-Length': '411', 'Vary': 
'negotiate', 'Server': 'Apache/2.2.22 (Fedora)', 'Connection': 'close', 'TCN': 
'choice'}, 'bozo': 1, 'status': 403, 'feed': {}, 'namespaces': {}, 'href': 
'http://www.icerocket.com/search?tab=twitter&q=chris&rss=1.xml', 'version': '', 
'encoding': 'UTF-8'}

What version of the product are you using? On what operating system?
Python3.3
feedparser 5.1.3
windows 7

Please provide any additional information below.

Original issue reported on code.google.com by rwols...@googlemail.com on 20 Feb 2013 at 8:53

GoogleCodeExporter commented 9 years ago
Icerocket is blocking by the HTTP User-Agent header. I recommend contacting 
icerocket.com and asking them to fix the issue on their server.

To work around this you can set feedparser.USER_AGENT to something like:

Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:20.0) Gecko/20100101 Firefox/20.0

but please be aware that icerocket.com, whether deliberately or accidentally, 
is blocking clients that aren't browsers and may truly be discouraging people 
from using their feed.

Original comment by kurtmckee on 27 Apr 2013 at 6:37