manolomartinez / greg

A command-line podcast aggregator
GNU General Public License v3.0
297 stars 37 forks source link

Failure to download episodes from "Hello Internet" podcast #31

Closed githuq closed 9 years ago

githuq commented 9 years ago

I have been trying to download episodes from the "Hello Internet" podcast, but greg fails with an error message:

$ greg add HelloInternet http://www.hellointernet.fm/podcast?format=rss
$ greg sync HelloInternet
Checking HelloInternet...
Traceback (most recent call last):
  File "/usr/bin/greg", line 9, in <module>
    load_entry_point('Greg==0.4.4.3', 'console_scripts', 'greg')()
  File "/usr/lib/python3/dist-packages/greg/gregparser.py", line 112, in main
    args.func(vars(args))
  File "/usr/lib/python3/dist-packages/greg/greg.py", line 746, in sync
    feed.download_entry(entry)
  File "/usr/lib/python3/dist-packages/greg/greg.py", line 286, in download_entry
    for mimetype in self.mime]):
  File "/usr/lib/python3/dist-packages/greg/greg.py", line 286, in <listcomp>
    for mimetype in self.mime]):
  File "/usr/lib/python3/dist-packages/feedparser.py", line 383, in __getitem__
    return dict.__getitem__(self, key)
KeyError: 'type'

Other podcasts work fine. I'm running Debian Jessie with stock packages (python 3.4.2 and python3-feedparser 5.1.3).

manolomartinez commented 9 years ago

Hi, yes, that podcast is defective in that it doesn't carry information about enclosure types, which greg relies on.

In the development-main branch, I have introduced a new notype option for greg.conf. The idea would be to have a section in this file with

[HelloInternet]

notype = yes

This would instruct greg to skip checking for type and simply download all enclosures. Would you mind installing greg from the development branch, and let me know if that works for you?

Manolo

FilipBB commented 9 years ago

Wouldn't the ignoreenclosures flag work?

Sent with AquaMail for Android http://www.aqua-mail.com

On July 16, 2015 7:44:04 PM manolomartinez notifications@github.com wrote:

Hi, yes, that podcast is defective in that it doesn't carry information about enclosure types, which greg relies on.

In the development-main branch, I have introduced a new notype option for greg.conf. The idea would be to have a section in this file with

[HelloInternet]

notype = yes

This would instruct greg to skip checking for type and simply download all enclosures. Would you mind installing greg from the development branch, and let me know if that works for you?

Manolo


Reply to this email directly or view it on GitHub: https://github.com/manolomartinez/greg/issues/31#issuecomment-122131943

manolomartinez commented 9 years ago

Wouldn't the ignoreenclosures flag work?

Ha, I hoped it might :) In fact it's a different way for a feed to be defective. The ignoreenclosures option is there to deal with feeds with no enclosures. This new issue is about feeds which enclosures, but no enclosure types.

I think the new notype option will help greg overcome this problem, but defective feeds are, admittedly, a bit of a wild goose chase. Perhaps at some point I will need to figure out a way for greg to expose all of the info parsed by feedparser, so that corner cases can be dealt with.

githuq commented 9 years ago

manolomartinez notifications@github.com writes:

I think the new notype option will help greg overcome this problem, but defective feeds are, admittedly, a bit of a wild goose chase. Perhaps at some point I will need to figure out a way for greg to expose all of the info parsed by feedparser, so that corner cases can be dealt with.

The 'notype' options sounds like it would do the trick. I'll try it out and get back to you.

Maybe you could introduce an option 'workarounds' or 'handledefects' that is list-valued? The two possible entries for the list as of now would be 'ignoreenclosures' and 'notype'. More can be added in the future to follow the goose whereever it goes.

My proposal is not fundamentally different from what is in greg now, but it allows one to call these things what they are: defects.

manolomartinez commented 9 years ago

Maybe you could introduce an option 'workarounds' or 'handledefects' that is list-valued? The two possible entries for the list as of now would be 'ignoreenclosures' and 'notype'. More can be added in the future to follow the goose whereever it goes.

My proposal is not fundamentally different from what is in greg now, but it allows one to call these things what they are: defects.

:) yep, defective feeds are frustrating. I don't know about adding a workarounds option, but perhaps it would be good to clearly separate bona-fide options and workaround options in greg.conf. I'll think about that, thanks!

M

githuq commented 9 years ago

I just had a look at the 'notype' option in the development-main branch. It does the trick and I can download the episodes from Hello Internet. Thanks for the quick patch.

manolomartinez commented 9 years ago

Sorry, I somehow missed your last message. Thanks for testing it. Closing for now.

ghost commented 9 years ago

FYI, it's not just Hello Internet, but it's largely any podcast hosted on SquareSpace. They have malformed podcast feeds and fixing them doesn't seem to be of high priority. Which is a pity.

manolomartinez commented 9 years ago

Ugh. Thanks for the heads up. That reminds me that, at least, I should make greg recommend adding the notype option to the feed conf when this problem arises.

ghost commented 9 years ago

Sounds like a good plan.