mmcdole / gofeed

Parse RSS, Atom and JSON feeds in Go
MIT License
2.6k stars 209 forks source link

Add support for Etag, Last-Modified, If-None-Match and If-Modified-Since http headers on request and response #111

Open requaos opened 5 years ago

requaos commented 5 years ago

Expected behavior

Getting the Etag and Last-Modified header values on the Feed object

Actual behavior

These values are not inspected for or provided

mmcdole commented 5 years ago

I agree, this should be something we support.

I honestly never expected people to use ParseURL() very much, and instead expected them to make their own HTTP requests and feed the reader to gofeed.

But it seems people do want to use it, so I should make this interface more robust.

requaos commented 5 years ago

I'll just leave this here:



import (
    "fmt"
    "net/http"
    "time"

    "github.com/pkg/errors"

    "github.com/mmcdole/gofeed"
)

var gmtTimeZoneLocation *time.Location

func init() {
    loc, err := time.LoadLocation("GMT")
    if err != nil {
        panic(err)
    }
    gmtTimeZoneLocation = loc
}

var ErrNotModified = errors.New("not modified")

type Reader interface {
    ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error)
}

func New(client *http.Client) Reader {
    return &reader{
        feedReader: gofeed.NewParser(),
        client:     client,
    }
}

type reader struct {
    feedReader *gofeed.Parser
    client     *http.Client
}

type Feed struct {
    *gofeed.Feed

    ETag         string
    LastModified time.Time
}

func (r *reader) ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error) {
    req, err := http.NewRequest(http.MethodGet, url, nil)
    if err != nil {
        return nil, err
    }
    req.Header.Set("User-Agent", "Gofeed/1.0")

    if etag != "" {
        req.Header.Set("If-None-Match", fmt.Sprintf(`"%s"`, etag))
    }

    req.Header.Set("If-Modified-Since", lastModified.In(gmtTimeZoneLocation).Format(time.RFC1123))

    resp, err := r.client.Do(req)

    if err != nil {
        return nil, err
    }

    if resp != nil {
        defer func() {
            ce := resp.Body.Close()
            if ce != nil {
                err = ce
            }
        }()
    }

    if resp.StatusCode == http.StatusNotModified {
        return nil, ErrNotModified
    }

    if resp.StatusCode < 200 || resp.StatusCode >= 300 {
        return nil, gofeed.HTTPError{
            StatusCode: resp.StatusCode,
            Status:     resp.Status,
        }
    }

    feed := &Feed{}

    feedBody, err := r.feedReader.Parse(resp.Body)
    if err != nil {
        return nil, err
    }
    feed.Feed = feedBody

    if eTag := resp.Header.Get("Etag"); eTag != "" {
        feed.ETag = eTag
    }

    if lastModified := resp.Header.Get("Last-Modified"); lastModified != "" {
        parsed, err := time.ParseInLocation(time.RFC1123, lastModified, gmtTimeZoneLocation)
        if err == nil {
            feed.LastModified = parsed
        }
    }

    return feed, nil
}
requaos commented 5 years ago

@mmcdole I only needed to extend the struct to add Etag and LastModified to provide this functionality

requaos commented 5 years ago

I could make a PR that modifies the ParseURL function like this: ParseURL(url string, opts ...Options) And define an etag option and a lastmodified option. By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.

mmcdole commented 5 years ago

@requaos I like the idea of the variadic parameter. I think that would be a good idea.

Some items in Options that I know people have requested:

aliml92 commented 1 year ago

Are there any updates on this issue yet?

infogulch commented 1 year ago

By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.

I like this idea, but how would the operator retrieve the Etag/Last-Modified header values from the ParseURL api? There's nowhere to output it, so I guess you'd have to modify the Feed struct... but then that doesn't work for cases like Parse which doesn't get the whole request just the body.

I think the only option may be a new api like the idea by requaos above. https://github.com/mmcdole/gofeed/issues/111#issuecomment-450956821

DamonHD commented 5 months ago

Please also consider somehow facilitating responding well to (200 and 304) Cache-Control max-ageand (429 and 503) Retry-Afterin each case to generate a "do-not-refetch-before" time?

See: https://www.earth.org.uk/RSS-efficiency.html

Rgds

Damon