ribbons / RadioDownloader

An easy to use application for managing podcast subscriptions and downloads.
https://nerdoftheherd.com/tools/radiodld/
GNU General Public License v3.0
15 stars 11 forks source link

A change to the enclosure URL no-longer breaks the download of a podcast episode. #153

Closed ribbons closed 11 years ago

ribbons commented 11 years ago

Original report from del at 17:17:03 on 2011-08-18

Not sure is this is a duplicate of another bug report or not.

I subscribed to a BBC feed as a podcast not by 'BBC radio'.

The URL was http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/rss.xml

I ended up with a subscription called "Early Music Show"

For each item in the list I get similar to this :

Date: Mon 15/Aug/11 09:00 Duration: 13min Error: Not available This episode appears to be no longer available. You can either try again later, or cancel the download to remove it from the list and clear the error.

Which is wrong, because the files concerned do exist

e.g. the XML contains a reference to <enclosure url="http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20110815-0900b.mp3" length="13450311" type="audio/mpeg"/>

This file exists-I can play it directly, I can download it, using the http URL.

I mention this because other BBC RSS feeds do work, it seems only the early music one that's failing, so I assume the downloader is tripping up over some XML component which is in an unexpected format.

Looking at page http://www.bbc.co.uk/podcasts/series/earlymusic I can see 6 previous episodes. I can right-click and save each.

I can see the same episodes listed in the downloader subscription. None of them work. I have removed all references to this subscription, and to the failed files, and have re-subscribed. Still no downloads ...

I subsequently went back to the downloader, via 'BBC radio', and drilled down into Radio 3, then "the Early Music Show", and I subscribed to that.

I can see only this weekend's episodes listed, i.e. as I'd expect from iPlayer. However these do work.

A bit long because I wanted to give you enough detail to be able to try this yourself.


Imported from Bug 540 in the NerdoftheHerd.com Bugzilla.

ribbons commented 11 years ago

Original comment from del at 02:06:07 on 2011-08-23

2 files failing to download from podcast (dump)

Prior to reading your reply I had removed then re-added the podcast. I had managed to get 4 of the 6 podcasts, but I always get errors with the 2 files in the (attachment) image. The situation was initially unclear because (it seemed to me) that an error in one podcast element propagated to others in the set. i.e. once the caccini one failed, the others did too. By ticking just one at a time then leaving RD to get on with it I got the 4 oldest ones. I've done as you suggested: essentially removed RD, chomped on IE - which I never use - and reinstalled, and I still get errors on these 2 files. The next step, when I have time, will be to try another machine. D

ribbons commented 11 years ago

Original comment from Matt Robinson at 16:58:16 on 2011-08-21

I've just tried, and I can download all of the episodes of the Podcast feed that you posted without any problems.

Could you try:

ribbons commented 11 years ago

Original comment from del at 12:55:35 on 2011-08-27

Hi Matt, I adopted a 2-prong strategy.

  1. A clean install on another machine - this worked fine.
  2. I followed exactly your instructions on the original machine. In addition to that I'd made sure to remove the two failed downloads from the download list. I did that a few days back, and again this morning; sadly the situation is exactly as before. I suspect if I wipe RD and reinstall that it'll probably work - but the cost would be to wreak havoc with all the materials I have already downloaded.

Have you any other suggestions?

ribbons commented 11 years ago

Original comment from Matt Robinson at 15:32:08 on 2011-08-28

Just wanted to confirm that you also deleted radiodld-httpcache.db when following the steps I gave?

Have you any other suggestions?

Have you got a software firewall (other than Windows Firewall)? If so, could you disable that too?

ribbons commented 11 years ago

Original comment from del at 23:19:42 on 2011-08-30

Just to confirm:

I have another computer with Radio Downloader newly installed. it has no problem in downloading the files concerned.

I was wanting to confirm I'd followed your instructions with care before assuming that there must be some sort of esoteric bug preventing RD from doing the two downloads concerned.

I've also hacked the db file, noticed it's sqlite format, and also noticed that the URL for the first broken file is correctly marked as <enclosure url="http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20110815-0900b.mp3" length="13450311" type="audio/mpeg" /> Ditto a <link> tag

I do note that there is a tag: <guid isPermaLink="false">http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20110815-0900.mp3</guid> which leads to a 404. But other clips have similar tags and Radio Downloader works fine on those.

I can use the enclosure URL manually in Chrome to hear the clip. I was able to download or open it from IE as well. So wherever the problem lies, it's not directly within the URL. It may be there is some slightly non-standard text within some of the other fields, I don't know. I'm guessing the problem lies outside the URL itself. There's an STV video URL where 'Gino and Mel' is 'Gino & Mel' and their RSS bombs!

Both machines are XP Pro SP3, IE8, fully patched. Both run AVG antivirus and no firewall, either locally or on the network. No the machines are not identical, but in the aspects I guess are important, they are.

Let me know what else I can do to help identify where the problem lies. Can I get you some debug?

Derek

ribbons commented 11 years ago

Original comment from Matt Robinson at 13:40:39 on 2011-09-02

Let me know what else I can do to help identify where the problem lies. Can I get you some debug?

The quickest way to isolate the problem would be to download the source code and step through the download to see exactly where the failure is occurring. For more details on getting started, please see the following page: http://www.nerdoftheherd.com/tools/radiodld/contribute/#code

Let me know if you get stuck, or if you find out where the problem seems to be occurring.

ribbons commented 11 years ago

Original comment from del at 22:22:55 on 2011-09-19

I have downloaded the source but as yet have not had time to compile in debuggable form, having been away and having other projects on hand. However I have done some further checking:

I did a clean install on a 2nd machine, and from this downloaded the two errant mp3 files successfully: I mentioned that some time back. I then carefully removed temp files; I removed the faulty subscription; I removed references to the 'error' downloads; I closed RD completely; I copied the db files from my 1st machine where RD reported errors. On next running RD on the new machine it too reported errors.

I conclude the problem lies (somehow) in the current content of a db file, and not the cache one. Looks to me like RD is finding something in store.db, despite my attempts to remove references to the two faulty downloads.

I mention it because there may be some esoteric bug that turns up once in a blue moon. RD report that the files are not found - i.e. seemingly it's looking for a non-existent URL, but I know that's wrong.

I can send some db files if you want to poke their innards? Easier for you, knowing the db structures, than me.

Derek

ribbons commented 11 years ago

Original comment from Matt Robinson at 20:18:17 on 2011-10-12

Apologies, am rather snowed under with Radio Downloader and non-Radio Downloader stuff at the moment (as my slow response indicates I suppose...). Once I'm a bit less busy I should be able to take a look, but it should be easier to find the issue with Visual Studio on your machine if you get the chance.

ribbons commented 11 years ago

Original comment from del at 15:19:56 on 2012-05-15

Matt, I found another podcast that RD refuses to download. I've done a little checking before reporting this. It looks to me like RD is struggling, perhaps because it's misreading the XML in the RSS feed? I know the XML used by the BBC contains badly formatted tags and characters like & which are meant to be reserved.

The URL: http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/rss.xml The title: EarlyMusic: The Symphonie 05 Nov 11

Behaviour: Error: Not available This episode appears to be no longer available. You can either try again later, or cancel the download to remove it from the list and clear the error.

I have cleared this and attempted a repeat download: same error I have a spare machine and did a clean RD installation: same error.

I fed the RSS into Google reader: http://www.google.com/reader/view/feed/http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/rss.xml?source=ignitionfork I can see the Symphonie item and I can play the mebedded clip

I viewed this via the BBC's web page http://www.bbc.co.uk/podcasts/series/earlymusic/all This page contains a valid link to the following mp3 http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20111107-0900b.mp3

My thoughts are that the URLs in the RSS feed and web pages are valid, and that therefore RD itself is struggling with this item.

(There are three such items in the earlymusic feed: Symphonie, Caccini, de Victoria, all of which play fine via the BBC web page or Google, and I wonder if close analysis of surrounding XML code witll reveal a BBC 'quirk' which is throwing RD off-beam?)

Here's the XML as it appears in my browser (Chrome) and I can't see any immediate difference from other items.

<item>
<title>EarlyMusic: The Symphonie 05 Nov 11</title>
<description>
A look at the origin of the 'Symphony'. We all know what is now called a Symphony, but the term has had many varied uses. Lucie Skeaping tracks down the origins of the word and its uses, encountering medieval hurdy-gurdys, spinets and virginals, and a tale that the dulcimer is as old as the Bible, not to mention a whole host of overtures, interludes, sonatas, canzonas and concertos. Broadcast as part of the BBC month long 'Celebration of the Symphony'.
</description>
<itunes:subtitle>
A look at the origin of the 'Symphony'. We all know what is now called a Symphony, but the term has had many varied uses. Lucie Skeaping tracks down the origins of the word and its uses, encountering medieval hurdy-gurdys, spinets and virginals, and a...
</itunes:subtitle>
<itunes:summary>
A look at the origin of the 'Symphony'. We all know what is now called a Symphony, but the term has had many varied uses. Lucie Skeaping tracks down the origins of the word and its uses, encountering medieval hurdy-gurdys, spinets and virginals, and a tale that the dulcimer is as old as the Bible, not to mention a whole host of overtures, interludes, sonatas, canzonas and concertos. Broadcast as part of the BBC month long 'Celebration of the Symphony'.
</itunes:summary>
<pubDate>Mon, 07 Nov 2011 09:00:00 +0000</pubDate>
<itunes:duration>18:27</itunes:duration>
<enclosure url="http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20111107-0900b.mp3" length="17808926" type="audio/mpeg"/>
<guid isPermaLink="false">
http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20111107-0900.mp3
</guid>
<link>
http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20111107-0900b.mp3
</link>
<media:content url="http://downloads.bbc.co.uk/podcasts/radio3/earlymusic/earlymusic_20111107-0900b.mp3" fileSize="17808926" type="audio/mpeg" medium="audio" expression="full" duration="1107"/>
<itunes:author>BBC Radio 3</itunes:author>
</item>
ribbons commented 11 years ago

Original comment from del at 15:26:13 on 2012-05-15

I do notice this in the header at the top of the page I just quoted, but I don't think that makes any difference to you.

<itunes:category text="Music"/>
<itunes:category text="Society & Culture">
<itunes:category text="History"/>
</itunes:category>

where we have a tag embedded within a tag, and also the use of & rather than & which is sometimes risky.

Strictly speaking the & is invalid; the embedded tag is valid but illogical.

However you'll find other ampersands where the BBC carelessly allow their editors to create tags without encoding them. There may be other similar 'errors'

ribbons commented 11 years ago

Original comment from Matt Robinson at 17:46:22 on 2012-06-13

I do notice this in the header at the top of the page I just quoted, but I don't think that makes any difference to you.

<itunes:category text="Music"/>
<itunes:category text="Society & Culture">
<itunes:category text="History"/>
</itunes:category>

where we have a tag embedded within a tag, and also the use of & rather than & which is sometimes risky.

Strictly speaking the & is invalid; the embedded tag is valid but illogical.

However you'll find other ampersands where the BBC carelessly allow their editors to create tags without encoding them. There may be other similar 'errors'

I've just looked at this podcast now, and the & is now being correctly encoded by the BBC - has the feed started working again in Radio Downloader since your comment by any chance?

ribbons commented 11 years ago

Original comment from Matt Robinson at 17:15:03 on 2012-09-11

As I've had no response to my last comment, not sure if the ampersand encoding was part of the problem. However, I have realised what the cause is for the problem initially described: The enclosure URL was stored as extra information against the episode when episode info was fetched. This is fine until it is updated (as it is when the BBC make a change to one of their podcasts), as this information is never refreshed.

Updated the podcast provider in 84b1f27 to fetch the enclosure URL at download time instead of storing it as extra info.