nextcloud / news

:newspaper: RSS/Atom feed reader
https://apps.nextcloud.com/apps/news
GNU Affero General Public License v3.0
861 stars 186 forks source link

this document is not a XML stream #459

Closed Grotax closed 4 years ago

Grotax commented 5 years ago

This replaces #421. I expect that in the following weeks and days we will get more reports of XML parsing errors. To keep it in a manageable format. I tested the reported feeds for three things.

  1. w3c -> true if feed is valid in w3c-validator (ignoring suggestions)
  2. feed-io-3.0 -> true if no error occurs while reading
  3. feed-io-4.1 -> same with that I created a list in json-format which can be found here: https://gist.github.com/Grotax/11153312d6712a1550c66b82f25e18d6

If you have a feed that throws a "this document is not a XML stream" that is not in the linked document. Please provide the following:

  1. url to the feed
  2. link to the w3c result

I will update the linked document.

Otherwise keep it short to keep the thread readable.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/71693537-this-document-is-not-a-xml-stream?utm_campaign=plugin&utm_content=tracker%2F38605310&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F38605310&utm_medium=issues&utm_source=github).
toolstack commented 5 years ago
  1. https://www.durhamregion.com/rss
  2. https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.durhamregion.com%2Frss
flesser commented 5 years ago
  1. https://www.eveonline.com/rss/news
  2. https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.eveonline.com%2Frss%2Fnews

Problem with this one seems to be that <author> contains just a name, not a valid E-Mail-Address.

kees-closed commented 5 years ago

As can be seen in the code block, the RSS feed [1] isn't recognized as an XML RSS feed by News. While it is valid according to W3 [2]. Can you please have a look at this as well? Furthermore, an XML validator [3] didn't return any errors either.

{
  "reqId": "tLKOqXUR1m0oH5D6oogU",
  "level": 2,
  "time": "2019-03-23T15:33:13+00:00",
  "remoteAddr": "",
  "user": "--",
  "app": "news",
  "method": "",
  "url": "--",
  "message": "http://www.gdacs.org/xml/rss.xml read error : this document is not a XML stream",
  "userAgent": "--",
  "version": "15.0.5.3"
}

[1] http://www.gdacs.org/xml/rss.xml [2] https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.gdacs.org%2Fxml%2Frss.xml [3] https://www.xmlvalidation.com/

Grotax commented 5 years ago

Please check the linked file before you post a link. @AquaL1te that link is already in my collection but thanks.

kees-closed commented 5 years ago

As can be seen in the code block, the RSS feed [1] isn't recognized as an XML RSS feed by News. While it is valid according to W3 [2]. Can you please have a look at this as well? Furthermore, an XML validator [3] didn't return any errors either.

{
  "reqId": "tLKOqXUR1m0oH5D6oogU",
  "level": 2,
  "time": "2019-03-23T15:33:13+00:00",
  "remoteAddr": "",
  "user": "--",
  "app": "news",
  "method": "",
  "url": "--",
  "message": "http://www.gdacs.org/xml/rss.xml read error : this document is not a XML stream",
  "userAgent": "--",
  "version": "15.0.5.3"
}

[1] http://www.gdacs.org/xml/rss.xml [2] https://validator.w3.org/feed/check.cgi?url=http%3A%2F%2Fwww.gdacs.org%2Fxml%2Frss.xml [3] https://www.xmlvalidation.com/

This issue is marked as resolved and so is the ticket where it was mentioned in first. However, the issue still persists. I am able to get the RSS feed when I import it into GNOME Evolution. Also the RSS validation checks out fine. I also contacted GDACS about this, but I guess they won't fix anything since the feed works with other readers. Also, it stopped working when the new News versions were released, so I guess there is a correlation.

Grotax commented 5 years ago

No you misunderstood my comment. I want a short thread. Your feed is already in my collection. Look at the first answer of this thread that was what I wanted.

toolstack commented 5 years ago
  1. https://www.durhamregion.com/rss
  2. https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.durhamregion.com%2Frss

Interestingly enough, if you go to feed-io.net and use the above RSS feed in the "Want to give it a try" area it does parse correctly... :man_shrugging:

Grotax commented 5 years ago

Probably something was fixed in 4.x we are using 3.x atm because 4.x dropped support for PHP < 7.1

I am planing to move to 4.x in the next major release.

toolstack commented 5 years ago

Any ETA on when you might do the next major release?

Grotax commented 5 years ago

No we don't make ETAs, as we are basically only two people and I don't have much experience with this project. But I could probably do pre-releases for version 14.0.0, you can follow the progress here: #494 There will be at least one more release for 13.1.x

toolstack commented 5 years ago

Thanks, completely understand :)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Grotax commented 5 years ago

Bad bot go away!

luksal commented 5 years ago
  1. https://www.hefe-und-mehr.de/feed
  2. https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.hefe-und-mehr.de%2Ffeed
pludi commented 5 years ago
  1. https://www.derstandard.at/rss/panorama
  2. https://validator.w3.org/feed/check.cgi?url=https%3A%2F%2Fwww.derstandard.at%2Frss%2Fpanorama
pludi commented 5 years ago

For reference, it's probably fixed in 4.2.2+ of feed-io, fixing alexdebril/feed-io#175 by filtering BOM bytes.

toolstack commented 5 years ago

@pludi I can confirm that filtering the BOM fixes the issue for the feed I reported.

andyboeh commented 5 years ago

@toolstack How did you test this? I tried to apply the fix from alexdebril/feed-io/pull/176 to the current stable release, but this didn't fix the issue I am experiencing with https://derstandard.at/?page=rss&ressort=Seite1 (should be similar, if not the same as @pludi's feed).

toolstack commented 5 years ago

@andyboeh I added the str_replace() call to my install and then removed and re-added the feed to news.

andyboeh commented 5 years ago

@toolstack I did the very same and I can now confirm that it definitely works with derstandard.at. It didn't work on my URL because the news feed URL has also changed (there was a website overhaul a few days ago). The URL provided by @pludi works fine.

flesser commented 5 years ago

+1, can confirm: manually applying the more than one year old upstream fix to vendor/debril/feed-io/src/FeedIo/Reader/Document.php makes all the feeds I missed since the switch to feed-io reappear in Nextcloud News.

Grotax commented 5 years ago

FeedIO 4+ requires PHP 7.1, NC 14&15 don't support all PHP 7.1 statements. So that's why FeedIO 3 was used