sof / feed

Haskell package for handling various feed (RSS) formats.
Other
14 stars 30 forks source link

date parsing #6

Closed joeyh closed 11 years ago

joeyh commented 11 years ago

getItemPublishDate currently returns a String, which can be formatted in a few different ways depending on the type of feed. It would be really great if this was changed to returning a parsed date. Use cases include sorting a set of feeds' items by date for display, and in my case, including the date in a filename when downloading a podcast.

sof commented 11 years ago

There's not overwhelming adherence to what the specs say on date formatting, though. Feel free to come up with reliable parsing of dates in Haskell for feeds & submit a pull request.

(I've had to spend months to get date handling reliable and right for a commercial service that's currently processing 10k feeds or so from around the globe. Not done in Haskell. People do the strangest things when it comes to dates.)

joeyh commented 11 years ago

sof wrote:

There's not overwhelming adherence to what the specs say on date formatting, though. Feel free to come up with reliable parsing of dates in Haskell for feeds & submit a pull request.

Would a Maybe Date be acceptable that works on dates that do conform to the standards?

see shy jo

sof commented 11 years ago

I don't like that too much, you're left with nothing if it fails.

joeyh commented 11 years ago

sof wrote:

I don't like that too much, you're left with nothing if it fails.

Either String Date then?

see shy jo

pxqr commented 11 years ago

That would be great since an API user will want to parse it anyway. Currently I use "%a, %e %b %Y %H:%M:%S %Z" to parse pubdate, but not sure it will work everywhere.

sof commented 11 years ago

I don't mind putting in a rfc822-like parser like that, but it won't be sufficient in general. (And I really don't want to take on providing reliable parsing of feed dates.)

So, how about:

getItemPublishDate :: ItemGetter (Maybe Date) getItemPublishDateString :: ItemGetter DateString

with the former returning Nothing if there isn't a pub date, but (Just Nothing) if it was unparseable. And documented as only supporting RFC822.

sof commented 11 years ago

https://github.com/sof/feed/commit/40dfafed0fe02a6cd8832e2699ef0404e8ab49f3

joeyh commented 11 years ago

Seems to me this at least also needs to support RFC3339 dates as used in Atom.

sof commented 11 years ago

Slippery slope.. pull request?

sof commented 11 years ago

https://github.com/sof/feed/commit/828467176a9a5dd63f3a5c973bbb9a19335986ed