mmcdole / gofeed

Parse RSS, Atom and JSON feeds in Go
MIT License
2.59k stars 208 forks source link

Add support for user defined user-agent string #74

Open yamamushi opened 7 years ago

yamamushi commented 7 years ago

Expected behavior

Parsing https://www.reddit.com/r/games/.rss should work with an appropriate delay in making requests (Reddit asks for 2 seconds between bot requests).

To further describe the issue, this could be resolved if we had the option of defining our own user-agent strings (or any headers for that matter) when calling gofeed.ParseURL(url string) or when constructing our parser with gofeed.NewParser() .

Actual behavior

Returns 429 Too Many Requests, as Reddit filters requests that do not have user-agent strings.

The first request will work, after which Reddit will block all new requests for a period of time.

Steps to reproduce the behavior

fp := gofeed.NewParser()
feed, err := fp.ParseURL("https://www.reddit.com/r/games/.rss")
if err != nil {
fmt.Println(err.Error())
return
}
// This first request will work
fmt.Println(feed.Title)

time.Sleep(5 * time.Second)

// This second request will fail because no user-agent string is defined for the request
secondfeed, err := fp.ParseURL("https://www.reddit.com/r/games/.rss")
if err != nil {
fmt.Println(err.Error())
return
}
fmt.Println(secondfeed.Title)

Note: Please include any links to problem feeds, or the feed content itself!

bogatuadrian commented 7 years ago

As a workaround you could use your own transport by implementing the RoundTripper interface to set the User-Agent header, like:

type UserAgentTransport struct {
    http.RoundTripper
}

func (c *UserAgentTransport) RoundTrip(r *http.Request) (*http.Response, error) {
    r.Header.Set("User-Agent", "<platform>:<app ID>:<version string> (by /u/<reddit username>)")
    return c.RoundTripper.RoundTrip(r)
}

func main() {
    fp := gofeed.NewParser()
    fp.Client = &http.Client{
        Transport: &UserAgentTransport{http.DefaultTransport},
    }
    fp.ParseURL("https://www.reddit.com/r/games/.rss")
}

The <platform>:<app ID>:<version string> (by /u/<reddit username>) is suggested by the reddit API documentation.

carthics commented 6 years ago

@bogatuadrian Thank you very much. This was really useful!

GaruGaru commented 6 years ago

108 Should resolve this