manolomartinez / greg

A command-line podcast aggregator
GNU General Public License v3.0
296 stars 37 forks source link

Dealing with slightly malformed feeds (missing timezone information) #87

Open yringot opened 6 years ago

yringot commented 6 years ago

action: add new feed (with missing timezone information) and set a download-from date two weeks ago to avoid downloading a lot of stuff expected result: only podcasts published after that date get downloaded by greg actual result: greg complains about being unable to parse time information and downloads all podcasts of that entire feed, disregarding the --downloadfrom parameter and apparently the firstsync = 10 parameter as well.

I subscribe to several podcast feeds via bitlove.org, a podcast distribution website that distributes podcasts via bittorrent. I have had some problems with malformed feeds causing greg to trip up and ignoring the --downloadfrom-date that I had set (now) when adding those feeds (to avoid downloading podcasts which I had already listened to before setting up greg).

One of these feeds is the following: http://bitlove.org/metaebene/freakshow-mp3/downloads.rss When I do greg sync on this feed, greg outputs I cannot parse the time information of this feed.I'll use your current local time instead. When I use W3C feed checker on that feed (https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fbitlove.org%2Fmetaebene%2Ffreakshow-mp3%2Fdownloads.rss) it highlights among other minor mistakes the following error:

line 11, column 35: pubDate must be an RFC-822 date-time: Thu, 15 Feb 2018 11:24:50 (25 occurrences) [help]

<pubDate>Thu, 22 Feb 2018 12:53:02 </pubDate>
^

The only thing that seems to be missing is the time zone after HH:mm:ss, yet greg just throws up its arms and says that it cant parse the date. Would it not make more sense for greg to throw a warning ("malformed date") and then just take the current time zone? (as is already implied in the above error message but doesn't seem to work)

I understand that this problem is not greg's fault, but that the feed is strictly speaking malformed. However, given that the date is otherwise fine and that only the time zone is missing, I would like to ask whether adding some additional logic to greg so that it ignores this minor error would be possible.

Here is my greg.conf: https://gist.github.com/yringot/f57d8116f4a4392a3695505c0bceeb9c

yringot commented 6 years ago

(I will also inform the producer of that podcast of the validation error.)

thomasboehm commented 4 years ago

It would be also nice to have an option to ignore the timezone altogether to be able to use {date} as intended by the publisher of the podcast.

I subscribed to a daily podcast which is in a timezone 1 hour ahead of me and they publish the episodes 5 past midnight. I save them with the date in the filename. So the files have always the date of the day before. Not a big issue, but it would be nice to have.