Open requaos opened 5 years ago
I agree, this should be something we support.
I honestly never expected people to use ParseURL() very much, and instead expected them to make their own HTTP requests and feed the reader to gofeed.
But it seems people do want to use it, so I should make this interface more robust.
I'll just leave this here:
import (
"fmt"
"net/http"
"time"
"github.com/pkg/errors"
"github.com/mmcdole/gofeed"
)
var gmtTimeZoneLocation *time.Location
func init() {
loc, err := time.LoadLocation("GMT")
if err != nil {
panic(err)
}
gmtTimeZoneLocation = loc
}
var ErrNotModified = errors.New("not modified")
type Reader interface {
ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error)
}
func New(client *http.Client) Reader {
return &reader{
feedReader: gofeed.NewParser(),
client: client,
}
}
type reader struct {
feedReader *gofeed.Parser
client *http.Client
}
type Feed struct {
*gofeed.Feed
ETag string
LastModified time.Time
}
func (r *reader) ReadFeed(url string, etag string, lastModified time.Time) (*Feed, error) {
req, err := http.NewRequest(http.MethodGet, url, nil)
if err != nil {
return nil, err
}
req.Header.Set("User-Agent", "Gofeed/1.0")
if etag != "" {
req.Header.Set("If-None-Match", fmt.Sprintf(`"%s"`, etag))
}
req.Header.Set("If-Modified-Since", lastModified.In(gmtTimeZoneLocation).Format(time.RFC1123))
resp, err := r.client.Do(req)
if err != nil {
return nil, err
}
if resp != nil {
defer func() {
ce := resp.Body.Close()
if ce != nil {
err = ce
}
}()
}
if resp.StatusCode == http.StatusNotModified {
return nil, ErrNotModified
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return nil, gofeed.HTTPError{
StatusCode: resp.StatusCode,
Status: resp.Status,
}
}
feed := &Feed{}
feedBody, err := r.feedReader.Parse(resp.Body)
if err != nil {
return nil, err
}
feed.Feed = feedBody
if eTag := resp.Header.Get("Etag"); eTag != "" {
feed.ETag = eTag
}
if lastModified := resp.Header.Get("Last-Modified"); lastModified != "" {
parsed, err := time.ParseInLocation(time.RFC1123, lastModified, gmtTimeZoneLocation)
if err == nil {
feed.LastModified = parsed
}
}
return feed, nil
}
@mmcdole I only needed to extend the struct to add Etag
and LastModified
to provide this functionality
I could make a PR that modifies the ParseURL function like this:
ParseURL(url string, opts ...Options)
And define an etag option and a lastmodified option. By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.
@requaos I like the idea of the variadic parameter. I think that would be a good idea.
Some items in Options that I know people have requested:
ETAG
LastModified
User-Agent
Timeout
Are there any updates on this issue yet?
By changing the signature to accept a variadic second parameter we would be able to add this functionality without introducing a breaking change to the API.
I like this idea, but how would the operator retrieve the Etag/Last-Modified header values from the ParseURL api? There's nowhere to output it, so I guess you'd have to modify the Feed
struct... but then that doesn't work for cases like Parse
which doesn't get the whole request just the body.
I think the only option may be a new api like the idea by requaos above. https://github.com/mmcdole/gofeed/issues/111#issuecomment-450956821
Please also consider somehow facilitating responding well to (200 and 304) Cache-Control max-age
and (429 and 503) Retry-After
in each case to generate a "do-not-refetch-before" time?
See: https://www.earth.org.uk/RSS-efficiency.html
Rgds
Damon
Expected behavior
Getting the Etag and Last-Modified header values on the Feed object
Actual behavior
These values are not inspected for or provided