ribbons / RadioDownloader

An easy to use application for managing podcast subscriptions and downloads.
https://nerdoftheherd.com/tools/radiodld/
GNU General Public License v3.0
15 stars 11 forks source link

Podcast enclosure URLs are unencoded before being downloaded #227

Open ribbons opened 5 years ago

ribbons commented 5 years ago

Now that #226 is fixed, another URL encoding issue has been discovered by @cjpcjpindre: Podcast enclosure URLs have URL encoded characters replaced by literal ones, which causes an issue if the server is expecting a URL encoded characters.

An example original enclosure URL from the feed https://anchor.fm/s/7368c04/podcast/rss is:

https://anchor.fm/s/7368c04/podcast/play/1722642/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2018-10-13%2FJohn-Kearns-782f1f393cd0f.m4a
ribbons commented 5 years ago

I'm really struggling with this one. The URL unencoding is done when it is passed to the .NET framework Uri class (which can't be avoided when using the WebClient for downloads). This means that the URL above will be changed into the following:

https://anchor.fm/s/7368c04/podcast/play/1722642/https://d3ctxlq1ktw2nl.cloudfront.net/staging/2018-10-13/John-Kearns-782f1f393cd0f.m4a

After some digging, it looks like this behaviour is partially fixed in the .NET framework 4.5 and the same behaviour can be enabled in .NET 2.0 via some slightly nasty reflection (courtesy of the code at https://mikehadlow.blogspot.com/2011/08/how-to-stop-systemuri-un-escaping.html), but this doesn't prevent the colon from being unescaped, so the URL ends up as:

https://anchor.fm/s/7368c04/podcast/play/1722642/https:%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2018-10-13%2FJohn-Kearns-782f1f393cd0f.m4a

This unfortunately still causes a 404 error to be returned from anchor.fm.

Suggestions appreciated!