manolomartinez / greg

A command-line podcast aggregator
GNU General Public License v3.0
297 stars 37 forks source link

Ignore Defective Entries [Suggection] #48

Closed ghost closed 8 years ago

ghost commented 8 years ago

Hi, So, the latest entries in these feeds appear to be defective: http://podcast.c-span.org/xml/radio_feed.xml https://thisishell.com/broadcast.xml http://www.bbc.co.uk/programmes/p02nrsmh/episodes/downloads.rss

When greg encounters them it outputs this error and exits: ... something went wrong. Are you connected to the internet?

As a result, greg is effectively broken when it is run as a cron job and 'greg sync' is limited. It would be extremely helpdul if there was an argument which would force greg to move on to the next feed even when an entry is defective or to ignore a given feed during 'greg sync'.

manolomartinez commented 8 years ago

Hi, is there a traceback, apart from the error message, or just that?

ghost commented 8 years ago

When I have "downloadhandler = greg" set I just get the "...some went wrong" error, but when I have the downloadedhandler set to wget per the config file example I do get a traceback error. For example, the https://thisishell.com/broadcast.xml feed outputs:

Downloading Episode 895: Conservatism, A History (Best of... April 9th, 2016) -- 20160409.mp3 --2016-04-12 11:29:00-- https://s3.amazonaws.com/thisishell-assets/mp3/20160409.mp3 Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.9.48 Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.9.48|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: unspecified ERROR: Redirection (301) without location. Traceback (most recent call last): File "/usr/bin/greg", line 9, in load_entry_point('Greg==0.4.6', 'console_scripts', 'greg')() File "/usr/lib/python3.5/site-packages/greg/gregparser.py", line 118, in main function(vars(args)) File "/usr/lib/python3.5/site-packages/greg/greg.py", line 776, in sync downloaded = feed.download_entry(entry) File "/usr/lib/python3.5/site-packages/greg/greg.py", line 338, in download_entry download_handler(self, placeholders) File "/usr/lib/python3.5/site-packages/greg/greg.py", line 575, in download_handler raise URLError TypeError: init() missing 1 required positional argument: 'reason'

FilipBB commented 8 years ago

I made a small change to make greg continue syncing when a feed fails: https://github.com/FilipBB/greg/commit/e21fa8773b0be8f9ae489b7a5d493a9751103f5f

There's probably a better way to do this, but I also use greg in a cron job and this works well enough for me.

manolomartinez commented 8 years ago

I am sorry I am not being very responsive these days. The end of semester is being more hectic than usual. Hopefully, after next Thursday I will be able to look into this and push a fix, perhaps based on @FilipBB's solution.

manolomartinez commented 8 years ago

@FilipBB I actually think now your solution is better than what greg currently has. Would you mind issuing a pull request with that commit?

FilipBB commented 8 years ago

Sure.

manolomartinez commented 8 years ago

So, we are going with @FilipBB's solution for the time being. I agree that keep-going seems like a better attitude in this context. Perhaps this should be another greg.conf flag?

Anyway, closing for now, unless and until you guys tell me that this is somehow misbehaving.