scsibug / feedparser-clj

Atom/RSS Feed Parsing for Clojure
Other
102 stars 33 forks source link

Ability to control user-agent would be just swell #13

Open daveliepmann opened 9 years ago

daveliepmann commented 9 years ago

Some RSS feed providers refuse connections according to user-agent strings (one, two)

It sure would be real nice to be able to pass in a custom user-agent string so that I'm able to use feedparser-clj to consume feeds that I'm already able to consume from the browser or curl.

daveliepmann commented 9 years ago

Something like the following would do the trick, but any change to the arity of parse-feed would be a breaking change that I'd rather leave to someone else:

First, add (java.net HttpURLConnection) to (:require).

Then, modify parse-feed so that it handles the following arity (which conflicts with the existing content-type arity):

([feedsource ua]
       (parse-internal (new XmlReader (if (string? feedsource)
                                        (doto (cast HttpURLConnection
                                                    (.openConnection (URL. feedsource)))
                                          (.setRequestProperty  "User-Agent" ua))
                                        feedsource))))

Currently it's possible to do this by passing in the HttpURLConnection, for example:

(rss/parse-feed (doto (cast HttpURLConnection
                              (.openConnection (URL. "http://www.whatever/feed.xml")))
                    (.setRequestProperty  "User-Agent"
                                          "Mozilla /5.0 (Compatible MSIE 9.0;Windows NT 6.1;WOW64; Trident/5.0)")))

...but I think this is functionality that belongs inside feedparser-clj.