sof / feed

Haskell package for handling various feed (RSS) formats.
Other
14 stars 30 forks source link

fails to find enclosures in feed http://www.ndr.de/fernsehen/sendungen/extra_3/videos/zum_mitnehmen/extradrei196_version-hq.xml #3

Closed joeyh closed 11 years ago

joeyh commented 11 years ago

I has a user report that this feed was not working with my podcast aggregator I've built using your excellent library. http://www.ndr.de/fernsehen/sendungen/extra_3/videos/zum_mitnehmen/extradrei196_version-hq.xml

Downloading it, and playing in ghci, it looks like the feed is parsed ok to the point of finding items, but getItemEnclosure fails to find any enclosures.

Prelude Text.Feed.Query Text.Feed.Import> f <- parseFeedFromFile "extradrei196_version-hq.xml" Prelude Text.Feed.Query Text.Feed.Import> map getItemEnclosure $ feedItems f [Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing,Nothing] Prelude Text.Feed.Query Text.Feed.Import>

Looking at the feed, it does seem to contain enclosures.

I don't know if the feed has some validity problem, or if this is a bug in this library.

joeyh commented 11 years ago

Looking at this feed in a validator, it has a number of possible issues.

http://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.ndr.de%2Ffernsehen%2Fsendungen%2Fextra_3%2Fvideos%2Fzum_mitnehmen%2Fextradrei196_version-hq.xml

This one seems most relevant:

line 50, column 0: Missing enclosure attribute: length (52 occurrences) [help] <enclosure url="http://media.ndr.de/progressive/2013/0821/TV-20130821-2329-5 ...

If I am reading getItemEnclosure right, it expects to find a length attribute in XMLItem feeds, but it defaults to 0 length. So I don't know why it is failing on this feed.

sof commented 11 years ago

Yes, that's the problem - RSS2.0 has 'length' as a required attribute, but it is not present in the example feed.

Being hard-nosed about spec compliance won't get you far when it comes to feeds "out there", so I'll make the length attribute be optional.

(Thanks for a fine report.)

sof commented 11 years ago

Fixed by https://github.com/sof/feed/commit/023f64410ff19bf8ec8e64908cc07fdf0193b584

joeyh commented 11 years ago

Something like this is needed to get it all to compile.

diff --git a/Text/Feed/Constructor.hs b/Text/Feed/Constructor.hs
index 98688c6..26f7fe0 100644
--- a/Text/Feed/Constructor.hs
+++ b/Text/Feed/Constructor.hs
@@ -568,7 +568,7 @@ withItemEnclosure url ty len fi =
                                         ,linkLength=Just (show len)
                                         }):Atom.entryLinks e}
     Feed.Types.RSSItem i  ->
-      Feed.Types.RSSItem  i{RSS.rssItemEnclosure=Just (nullEnclosure url len (fromMaybe "text/html" ty))}
+      Feed.Types.RSSItem  i{RSS.rssItemEnclosure=Just (nullEnclosure url (Just len) (fromMaybe "text/html" ty))}
     Feed.Types.RSS1Item i -> Feed.Types.RSS1Item 
           i{RSS1.itemContent=nullContentInfo{ contentURI=Just url
                                             , contentFormat=ty
diff --git a/Text/Feed/Translate.hs b/Text/Feed/Translate.hs
index ee0c5aa..e4aee32 100644
--- a/Text/Feed/Translate.hs
+++ b/Text/Feed/Translate.hs
@@ -93,7 +93,7 @@ toAtomItem it =
        withItemEnclosure' e = 
           withItemEnclosure (rssEnclosureURL e)
                             (Just $ rssEnclosureType e)
-                            (rssEnclosureLength e)
+                            (fromMaybe 0 $ rssEnclosureLength e)
        withItemId' g = withItemId (fromMaybe True (rssGuidPermanentURL g)) (rssGuidValue g)

        mb _ Nothing  = id
sof commented 11 years ago

Quite right, thanks. Thought I did a clean rebuild, but obviously not.

Dealt with slightly differently, https://github.com/sof/feed/commit/64f217677a54b44539f02e192f6ca68049336333