Closed koehn closed 7 years ago
I have the same issue with the CVE feed at http://nvd.nist.gov/download/nvd-rss.xml except not years-old articles. Pretty much every day (sometimes it seems several times per day), I get new unread articles that are 1+ days old in my feed. Given the volume of entries in this feed, it can be very frustrating to mark them read again--especially since I intentionally leave some unread if they need my attention, because it means I can't just mark all as read.
News app version: 10.1.0
Nextcloud version: 11.0.1
PHP version: 7.0.15
Database and version: MariaDB 10.1.21
Browser and version: N/A (not browser-specific)
Distribution and version: FreeBSD 11.0
You guys need to track this down. Unfortunately don't have time to debug it and unless proven otherwise my guess is broken feeds (bugged http server caching maybe?) because other feeds work just fine
I saved a version of the CVE feed last night and another this morning after "old" entries appeared. Then I looked for one example, formatted its markup, and diffed. In this case, the difference is that they changed the title to add the affected product's name.
Initial entry:
<div class="entry">
<h3>
<a href="https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2017-6196">
<span xml:base="https://nvd.nist.gov/download/nvd-rss.xml">CVE-2017-6196</span>
</a>
<div class="lastUpdated">02/23/17 23:59</div>
</h3>
<div xml:base="https://nvd.nist.gov/download/nvd-rss.xml" class="feedEntryContent">...</div>
</div>
Updated entry:
<div class="entry">
<h3>
<a href="https://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2017-6196">
<span xml:base="https://nvd.nist.gov/download/nvd-rss.xml">CVE-2017-6196 (afpl_ghostscript)</span>
</a>
<div class="lastUpdated">02/23/17 23:59</div>
</h3>
<div xml:base="https://nvd.nist.gov/download/nvd-rss.xml" class="feedEntryContent">...</div>
</div>
Note that neither the URL nor the lastUpdated properties changed. Would it be possible to use the URL rather than the title text to determine whether it's a new feed entry?
Sorry, but I was in a hurry when I grabbed the files. I just thought about it and realized it shouldn't be HTML, so I went back to check. I had grabbed the files using Firefox, which converted the feed to HTML and styled it. I can redo the test and grab the raw XML if you'd like, but I believe the root cause is the same (otherwise Firefox would have presented the data differently): the title element was updated, but the link/URL and date were not.
Yeah that's very likely the cause. If the feed does not provide a guid/id then a hash over title URL and content is used to identify it (there are good reasons for those 3 ;))
@chriswells0 yeah, at least your feed does not provide guids, so everything works as expected ;)
If only content changes, I definitely wouldn't expect a previously read article to appear as unread in my feed--could be a typo correction or something. Clearly a GUID is best, but I'm simply curious why all 3 of those are used when a GUID isn't available.
Please don't take this the wrong way; I'm only using it as a point of reference. This same feed doesn't have this issue in Feedly, which leads me to believe it wouldn't "surprise" anyone if title changes on old entries were ignored. I do plan to reach out to see if I can convince them to add a GUID to their feed, but I don't expect to have any luck at all.
@koehn Were you able to test your feed as well? I'm curious if it's the same cause, but I'd guess that it is.
@chriswells0 the reason is that various feeds publish various combinations of title, link and content. Most notably an apple.com feed broke in the past because they posted different posts with the same content (empty) and url
If the feed does not provide a guid shitty behavior like this is to be expected. Just contact the web admin, I don't think its a big issue at all.
I'm unfamiliar with the RSS spec (used Atom for my own site), so I looked at it before contacting NIST in order to be specific about what's needed. However, it seems the RSS 1.0 spec doesn't include a GUID or anything like it:
http://web.resource.org/rss/1.0/spec
The closest thing I see to a GUID for an item element is the rdf:about attribute, which does exist in this feed, so I don't believe contacting NIST will help.
Luckily, I found that they offer an alternative feed: "The advantage of the second feed is that we are able to provide vulnerable product names in the title. The advantage of the former is that you learn about new vulnerabilities as soon as possible." I might switch to the 2nd feed to work around this issue, but I believe you should reconsider this approach for RSS 1.0 feeds. Anyone publishing different articles using the same URI is probably doing it wrong--even if it's Apple. ;)
Yeah RSS 1.0 is trash that's why there's 2.0 and atom :)
I'm having the same issue on Nextcloud Beta 12. Before upgrading the NC server I didn't have this issue.
Operating system: Linux Zbox 4.2.8-040208-generic #201512150620 SMP Tue Dec 15 06:22:17 UTC 2015 x86_64
Web server: Apache/2.4.18 (Ubuntu) (apache2handler)
Database: mysql 10.0.29
PHP version: 7.0.15-0ubuntu0.16.04.4 Modules loaded: Core, date, libxml, openssl, pcre, zlib, filter, hash, Reflection, SPL, session, standard, apache2handler, mysqlnd, PDO, xml, apcu, calendar, ctype, curl, dom, mbstring, fileinfo, ftp, gd, gettext, iconv, json, exif, mysqli, pdo_mysql, Phar, posix, readline, shmop, SimpleXML, sockets, sysvmsg, sysvsem, sysvshm, tokenizer, wddx, xmlreader, xmlwriter, xsl, zip, Zend OPcache
Nextcloud version: 12.0 beta 1 - 12.0.0.16
List of activated apps:
Did you follow the steps in the readme (especially the reserved_at stuff because that happened for me)
I'm closing this issue since it is too generic and has too little debug info (works fine here)
@aproposnix if the readme faq does not work for you, file a new issue. This issue is about something completely different.
IMPORTANT
Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)
Explain the Problem
What problem did you encounter? I have a number of feeds that show old (like years old) articles once per day. Typically sometime overnight the updater will load articles that hit the feed years ago; I'm not sure what to do about it. Here are some feeds that exhibit this behavior:
I'm running the latest News (10.1.0), NextCloud-News-Updater (installed with PIP), and NextCloud (11.0).
Steps to Reproduce
Explain what you did to encounter the issue
System Information
bitnami/php-fpm:latest
, which boils down to Debian Jessie)Contents of nextcloud/data/nextcloud.log
Contents of Browser Error Console
Read http://ggnome.com/wiki/Using_The_Browser_Error_Console if you are unsure what to put here