nextcloud / news

:newspaper: RSS/Atom feed reader
https://apps.nextcloud.com/apps/news
GNU Affero General Public License v3.0
862 stars 186 forks source link

Feed-IO config doesn't work for Tumblr feeds #456

Closed IBBoard closed 5 years ago

IBBoard commented 5 years ago

IMPORTANT

Read and tick the following checkbox after you have created the issue or place an x inside the brackets ;)

Explain the Problem

Tumblr feeds have stopped working

Steps to Reproduce

  1. Add a Tumblr feed (e.g. http://androidniceties.tumblr.com/rss)
  2. Wait for it to reload
  3. See errors about not supporting the feed

On further debugging (using wget), it looks like Tumblr sees the Firefox user agent and assumes that you need to see the GDPR prompt. If your user agent includes the world "curl" anywhere then it's allowed through without a problem.

Unfortunately, using a user agent like "nextcloud-news" seems to get the prompt page as well. I know some sites dislike CLI downloaders like curl, though, so I don't know whether setting a curl-like user agent to fix Tumblr will break other sites.

System Information

Contents of nextcloud/data/nextcloud.log

{"reqId":"…","level":2,"time":"2019-03-20T20:59:01+00:00","remoteAddr":"","user":"--","app":"news","method":"","url":"--","message":"http:\/\/androidniceties.tumblr.com\/rss read error : No parser can handle this stream","userAgent":"--","version":"15.0.5.3"}
SMillerDev commented 5 years ago

This is indeed the issue: https://superuser.com/questions/1331274/how-can-i-access-tumblr-rss-feeds-from-europe-without-using-a-browser

A lot more feeds break on using Curl then use this stupid GDPR "fix" though, so I doubt we can find a good fix for this.

IBBoard commented 5 years ago

I wondered about adding a special "if domain is x.tumblr.com" condition, but that probably won't work for the people who use Tumblr with custom domains.

Could it be done as an advanced setting instead? Like with the HTTP auth settings, allow people to override the user agent on a per-feed basis. It'd fix the problem while not starting NextCloud News down a line of supporting per-feed adjustments.

SMillerDev commented 5 years ago

I'm fine with adding support for a custom user agent in the settings. Will just be a very low priority for me. It should be pretty easy though, it's almost exclusively edits to the FetcherConfig class, I'd be more then happy to guide you through a PR.

ccoenen commented 5 years ago

The saner choice would be to beat tumblr into submission. This way all feed readers would benefit.

IBBoard commented 5 years ago

That's true, and that'd be the best way, but the general noises from the community are that no-one ever got a responses when reporting it. Besides which, the fact that News reports a Firefox user agent does make it look like it could be a human. It's annoying that we can't use a Nextcloud UA, though, as they appear to have a whitelist system rather than a "human browser" checklist.

I've had an open ticket with a news site for nearly six months to ask them to encode quotes inside the alt text of images and they can't even manage that, so I'm not hopeful about Tumblr fixing this!

While they could just whitelist the RSS feed because they can't directly embed trackers, I guess they could argue that the posts in the feed may contain trackers and so they need the warning anyway. Although that argument falls apart as soon as you "curl" the page, because you've bypassed the prompt but you still might show the content to a user and trigger the trackers!

SMillerDev commented 5 years ago

News does have a custom user agent. So that should be able to get whitelisted

IBBoard commented 5 years ago

Ah, okay. I'd been trawling user agents and only found the Firefox string. That must be the default. I didn't realise.

SMillerDev commented 5 years ago

https://github.com/nextcloud/news/issues/437

Set that one.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.