nextcloud / news

📰 RSS/Atom feed reader
https://apps.nextcloud.com/apps/news
GNU Affero General Public License v3.0
870 stars 187 forks source link

Feed update fails due to an unparsable date format. #461

Closed toolstack closed 5 years ago

toolstack commented 5 years ago

IMPORTANT

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

Explain the Problem

Feed update fails due to an unparsable date format.

Steps to Reproduce

Explain what you did to encounter the issue 1.Add https://sourceforge.net/projects/cheeseburgerdumplings/rss?path=/15.1/dumpling

System Information

Contents of nextcloud/data/nextcloud.log


Update failed more than 50 times: InvalidArgumentException: Impossible to convert date : Thu, 21 Mar 2019 22:45:08 UT in /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Rule/DateTimeBuilder.php:151
Stack trace:
#0 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Rule/DateTimeBuilder.php(137): FeedIo\Rule\DateTimeBuilder->stringToDateTime('Thu, 21 Mar 201...')
#1 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Rule/ModifiedSince.php(27): FeedIo\Rule\DateTimeBuilder->convertToDateTime('Thu, 21 Mar 201...')
#2 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Parser/XmlParser.php(115): FeedIo\Rule\ModifiedSince->setProperty(Object(FeedIo\Feed), Object(DOMElement))
#3 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Parser/XmlParser.php(95): FeedIo\Parser\XmlParser->handleNode(Object(FeedIo\Feed), Object(DOMElement), Object(FeedIo\RuleSet))
#4 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Parser/XmlParser.php(53): FeedIo\Parser\XmlParser->parseNode(Object(FeedIo\Feed), Object(DOMElement), Object(FeedIo\RuleSet))
#5 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/ParserAbstract.php(70): FeedIo\Parser\XmlParser->parseContent(Object(FeedIo\Reader\Document), Object(FeedIo\Feed))
#6 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Reader.php(156): FeedIo\ParserAbstract->parse(Object(FeedIo\Reader\Document), Object(FeedIo\Feed))
#7 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Reader.php(138): FeedIo\Reader->parseDocument(Object(FeedIo\Reader\Document), Object(FeedIo\Feed))
#8 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/Reader.php(117): FeedIo\Reader->handleResponse(Object(FeedIo\Adapter\Guzzle\Response), Object(FeedIo\Feed))
#9 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/FeedIo.php(286): FeedIo\Reader->read('https://sourcef...', Object(FeedIo\Feed), Object(DateTime))
#10 /var/www/nextcloud/apps/news/vendor/debril/feed-io/src/FeedIo/FeedIo.php(300): FeedIo\FeedIo->read('https://sourcef...', Object(FeedIo\Feed), Object(DateTime))
#11 /var/www/nextcloud/apps/news/lib/Fetcher/FeedFetcher.php(78): FeedIo\FeedIo->readSince('https://sourcef...', Object(DateTime))
#12 /var/www/nextcloud/apps/news/lib/Fetcher/Fetcher.php(68): OCA\News\Fetcher\FeedFetcher->fetch('https://sourcef...', false, '', NULL, NULL)
#13 /var/www/nextcloud/apps/news/lib/Service/FeedService.php(228): OCA\News\Fetcher\Fetcher->fetch('https://sourcef...', false, '', NULL, NULL)
#14 /var/www/nextcloud/apps/news/lib/Service/FeedService.php(184): OCA\News\Service\FeedService->update(39, 'greg')
#15 /var/www/nextcloud/apps/news/lib/Utility/Updater.php(49): OCA\News\Service\FeedService->updateAll()
#16 /var/www/nextcloud/apps/news/lib/Cron/Updater.php(52): OCA\News\Utility\Updater->update()
#17 /var/www/nextcloud/lib/private/BackgroundJob/Job.php(61): OCA\News\Cron\Updater->run(NULL)
#18 /var/www/nextcloud/cron.php(123): OC\BackgroundJob\Job->execute(Object(OC\BackgroundJob\JobList), Object(OC\Log))
#19 {main}
flesser commented 5 years ago

I got another feed that was unable to update in nextcloud/news because of an "Impossible to convert date" error in feed-io.

The fault here is clearly on the feeds side, having <pubDate>NaN NaN +</pubDate>on one feed item, so it's no wonder that the date could not be transformed.

However, from a user's point of view, it is much more preferable to either have that feed item displayed without any date, or at least (but less preferably) the rest of the feed updated properly with the faulty item ignored.

Since I am not sure if that kind of error-tolerant behavior should be part of the upstream library or the news application, I'm posting this here instead of the upstream bug report.

Grotax commented 5 years ago

I don't think we can do anything about bad feeds. Maybe after the switch to the current version of feed-io (we are currently using legacy) we can revisit the topic of bad behaving feeds.

dynobo commented 5 years ago

Just want to add, quite a lot of my feeds have the same problem and are not getting updated since ~2weeks, e.g. http://podcast.hr2.de/derTag/podcast.xml

Error: DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character

Full entry in nextcloud.log:


{"reqId":"XJZj9CUR4EAAABTfYK4AAAAN","level":3,"time":"2019-03-23T16:51:00+00:00","remoteAddr":"109.193.194.144","user":"mark","app":"index","method":"PATCH","url":"\/apps\/news\/feeds\/48","message":{"Exception":"Exception","Message":"DateTime::__construct(): Failed to parse time string (0) at position 0 (0): Unexpected character","Code":0,"Trace":[{"file":"\/home\/www\/stratius\/apps\/news\/lib\/Fetcher\/FeedFetcher.php","line":75,"function":"__construct","class":"DateTime","type":"->","args":["0"]},{"file":"\/home\/www\/stratius\/apps\/news\/lib\/Fetcher\/Fetcher.php","line":63,"function":"fetch","class":"OCA\\News\\Fetcher\\FeedFetcher","type":"->","args":["http:\/\/podcast.hr2.de\/derTag\/podcast.xml",false,"0",null,null]},{"file":"\/home\/www\/stratius\/apps\/news\/lib\/Service\/FeedService.php","line":228,"function":"fetch","class":"OCA\\News\\Fetcher\\Fetcher","type":"->","args":["http:\/\/podcast.hr2.de\/derTag\/podcast.xml",false,"0",null,null]},{"file":"\/home\/www\/stratius\/apps\/news\/lib\/Service\/FeedService.php","line":490,"function":"update","class":"OCA\\News\\Service\\FeedService","type":"->","args":[48,"mark",true]},{"file":"\/home\/www\/stratius\/apps\/news\/lib\/Controller\/FeedController.php","line":322,"function":"patch","class":"OCA\\News\\Service\\FeedService","type":"->","args":[48,"mark",{"fullTextEnabled":true}]},{"file":"\/home\/www\/stratius\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":166,"function":"patch","class":"OCA\\News\\Controller\\FeedController","type":"->","args":[48,null,true,null,null,null,null]},{"file":"\/home\/www\/stratius\/lib\/private\/AppFramework\/Http\/Dispatcher.php","line":99,"function":"executeController","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/home\/www\/stratius\/lib\/private\/AppFramework\/App.php","line":118,"function":"dispatch","class":"OC\\AppFramework\\Http\\Dispatcher","type":"->","args":[{"__class__":"OCA\\News\\Controller\\FeedController"},"patch"]},{"file":"\/home\/www\/stratius\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php","line":47,"function":"main","class":"OC\\AppFramework\\App","type":"::","args":["OCA\\News\\Controller\\FeedController","patch",{"__class__":"OC\\AppFramework\\DependencyInjection\\DIContainer"},{"feedId":"48","_route":"news.feed.patch"}]},{"function":"__invoke","class":"OC\\AppFramework\\Routing\\RouteActionHandler","type":"->","args":[{"feedId":"48","_route":"news.feed.patch"}]},{"file":"\/home\/www\/stratius\/lib\/private\/Route\/Router.php","line":297,"function":"call_user_func","args":[{"__class__":"OC\\AppFramework\\Routing\\RouteActionHandler"},{"feedId":"48","_route":"news.feed.patch"}]},{"file":"\/home\/www\/stratius\/lib\/base.php","line":987,"function":"match","class":"OC\\Route\\Router","type":"->","args":["\/apps\/news\/feeds\/48"]},{"file":"\/home\/www\/stratius\/index.php","line":42,"function":"handleRequest","class":"OC","type":"::","args":[]}],"File":"\/home\/www\/stratius\/apps\/news\/lib\/Fetcher\/FeedFetcher.php","Line":75,"CustomMessage":"--"},"userAgent":"Mozilla\/5.0 (X11; Linux x86_64; rv:65.0) Gecko\/20100101 Firefox\/65.0","version":"15.0.5.3"}```
Grotax commented 5 years ago

@dynobo that one is fixed in 13.1.4 check the updater

alexdebril commented 5 years ago

Hi @flesser

Here are my comments below:

The fault here is clearly on the feeds side, having <pubDate>NaN NaN +</pubDate>on one feed item, so it's no wonder that the date could not be transformed.

Yes, there's nothing you can do to fix the date in that case. You could try to set the 'last-modified' header to every items but it's not trustworthy.

However, from a user's point of view, it is much more preferable to either have that feed item displayed without any date, or at least (but less preferably) the rest of the feed updated properly with the faulty item ignored.

Yes, which means feed-io shouldn't raise an exception in that case. Maybe a kind of notice could be attached to the feed to let the consumer know that something wrong was detected.

Since I am not sure if that kind of error-tolerant behavior should be part of the upstream library or the news application, I'm posting this here instead of the upstream bug report.

This is the tricky part. Even if feed-io gets more tolerant about those errors, you'll face another issue: how to fetch only the new items in a reliable way if you can't rely on the publication date ?

SMillerDev commented 5 years ago

Personally I'd say it's up to the author of the feed to do things correctly and failure to do so should result in angry users. Otherwise you'll end up with an endless game of cat and mouse where authors keep screwing up in worse ways. The only thing you could do would be log an error and take 1-1-1970 as a date imho.

toolstack commented 5 years ago

This has been fixed in the upstream library via this commit: https://github.com/alexdebril/feed-io/commit/4381ae0f65fcf12f913b0854a7ed45c96a50e6e6

Grotax commented 5 years ago

And I just merged the update to the latest version.

Anrock commented 5 years ago

I'm getting this error in one of my feeds with News 13.1.6. Do I need to do something with broken feed or is it supposed to be fixed automagically with newer News version?

Grotax commented 5 years ago

no if the feed contains invalid dates, it will be broken in news Step 1: complain to feed author, also include w3c feed validator results Step 2: complain again if they don't fix it Step 3: hope that they fixed it

We can't add error resistance for every problem that might occur.