rometools / rome

Java library for RSS and Atom feeds
https://rometools.github.io/rome
Apache License 2.0
910 stars 168 forks source link

[fetcher] Rome doesn't seem to parse the pubDate correctly out of the RSS feed when it is CDATA #224

Closed mishako closed 8 years ago

mishako commented 8 years ago

Issue by PatrickGotthard Monday Sep 30, 2013 at 15:37 GMT Originally opened as https://github.com/rometools/rome-fetcher/issues/2


=== This issue was migrated from JIRA ===
Type: Bug
Priority: Minor
Status: Open
Resolution: Unresolved
Environment: Java 6/Linux/ Rome and Rome fetcher 1.0.
Reported by: Jeffrey Haskovec
Assigned to: Nick Lothian
Created: Fri Jan 13 22:30:47 CET 2012
Updated: Fri Oct 26 04:47:19 CEST 2012
JIRA Link: https://rometools.jira.com/browse/FETCHER-1
=========================================

Trying to read this feed:

http://pmq.com/news/rss.php

Here is an example of it:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>PMQ News Room</title>
<description>All the hottest pizza news.</description>
<link>http://pmq.com/news</link>
<language>en-us</language>
<pubDate>Fri, 13 Jan 2012 12:50:56 EST</pubDate>
<item>
<link>http://pmq.com/news/news.php?id=15463</link>
<guid isPermaLink="true">http://pmq.com/news/news.php?id=15463</guid>

<title><![CDATAFirst Mobile Neapolitan Pizzeria, to Launch in San Francisco in February]></title>
<description><![CDATA["Ex-Flour + Water pizzaiolo Jon Darsky is launching a pioneering mobile operation called Del Popolo ("of the people"), as Tablehopper reports. Darsky and a design team have adapted a shipping container into a wood-fired pizzeria, loaded onto a flat-bed truck, with an on-board, 5,000-pound Stefano Ferrara pizza oven from Naples, just like the ones Una Pizza Napoletana and Cupola have," reports SanFrancisco.GrubStreet.com.]]></description>
<pubDate><![CDATA2012-01-12 11:18:03]></pubDate>
<comments>http://pmq.com/news/news.php?id=15463</comments>
</item>
<item>
<link>http://pmq.com/news/news.php?id=15462</link>
<guid isPermaLink="true">http://pmq.com/news/news.php?id=15462</guid>

<title><![CDATAPizza Shop Owner Shocked By Food Poisonings]></title>
<description><![CDATA["The owner of a Ballarat pizza shop closed down after a salmonella outbreak has pleaded to his customers for support as health authorities continued investigations into a regional egg supplier," reports SMH.com.au.]]></description>
<pubDate><![CDATA2012-01-12 10:46:22]></pubDate>
<comments>http://pmq.com/news/news.php?id=15462</comments>
</item>

The pubdate is in CDATA and when I look at it in rome it is null. So it isn't handled correctly, the title and description are being handled correctly in this case.

mishako commented 8 years ago

Comment by PatrickGotthard Monday Sep 30, 2013 at 15:37 GMT


=== This comment was migrated from JIRA ===
Author: gmkoliver
Created: Fri Oct 26 04:47:19 CEST 2012
===========================================

I'm running into a similar issue where it seems like pubdates of the form <pubDate>Fri, 26 Oct 2012 01:47:20 UTC</pubDate> are not parsed. Here is an entire example item,

<item>
<title>Codebase Downloads w/ Pfile/Passwords</title>
<link>http://www.mudbytes.net/index.php?a=topic&amp;t=4011&amp;p=63177#p63177</link>
<description>[quote=url=/topic-4011-63131#p63131KaVir/url]And while we're on the subject of submissions, I'd still like some way to remove url=http://www.mudbytes.net/topic-3547or at least mark/url iobsolete/i code. It's irritating having to explain to people that despite its name, "KaVir's MUD Protocol Handler (Fixed Up Source Code)" is actually obsolete, and that they should instead download "KaVir's MUD Protocol Handler"./quoteThere doesn't appear to be an option to delete files, but if</description>
<guid isPermaLink="true">http://www.mudbytes.net/index.php?a=topic&amp;t=4011&amp;p=63177#p63177</guid>
<pubDate>Fri, 26 Oct 2012 01:47:20 UTC</pubDate>
<category>General Chatter</category>
<author>nobody@example.com (Scandum)</author>
</item>

Which is from this feed, http://www.mudbytes.net/index.php?a=rssfeed

All dates in the parsed feed are nil. Anyone have a clue about this?

mishako commented 8 years ago

Comment by MrBob007 Tuesday Mar 11, 2014 at 08:31 GMT


I'm also running into a similar issue, when trying to read a feed with pubDate of the form

Tue, 11 Mar 2014 09:18:54. Here is an antire example: Cameroun:Sévérin TCHOUNKEU «Nous mettons sous tabloïd cette œuvre de « 33 » degrés afin que la vérité soit » http://www.camer.be/32364/11:1/cameroun-severin-tchounkeu-nous-mettons-sous-tabloid-cette-uvre-de-33-degres-afin-que-la-verite-soit--cameroon.html <p><img alt="Andre Siaka:Camer.be" src="/UserFiles/image/Andre_Siaka110214300.jpg" style="height:100px; opacity:0.9; width:100px" title="Cameroun:Sévérin TCHOUNKEU «Nous mettons sous tabloïd cette œuvre de « 33 » degrés afin que la vérité soit »" />Dans la boue o&ugrave; baignent d&eacute;sormais nombre de nos concitoyens, il est des exceptions qu&rsquo;il convient de relever pour leur abn&eacute;gation au travail bien fait et leur attachement &agrave; une certaine &eacute;thique du management : Andr&eacute; Siaka fait incontestablement partie de ces exceptions-l&agrave; ! Qui donc mieux que ses employeurs pour le dire ? Qui donc mieux que ceux-l&agrave; qui l&rsquo;ont c&ocirc;toy&eacute; pendant plus d&rsquo;un quart de si&egrave;cle pour en t&eacute;moigner ? Il nous a paru n&eacute;cessaire de mettre &agrave; la port&eacute;e de tous, ces &eacute;crits que&nbsp;</p> Tue, 11 Mar 2014 09:04:08 The feed comme from http://camer.be/rss.php
mishako commented 8 years ago

Comment by mishako Thursday Dec 10, 2015 at 21:17 GMT


Let's not mix up different date/time formats.

Looks like we can close this issue.

mishako commented 8 years ago

Comment by mishako Tuesday Dec 22, 2015 at 15:43 GMT


Closing this. Please speak up if you don't agree.