Can't add feed https://kali.org/feed/

KopfKrieg commented 5 years ago

Explain the Problem

I'm not able to add the Kali Linux feed to Nextcloud News. I know it worked in the past, but apparently it stopped working a while ago. I tried to remove the entry and add it again, but unfortunately I get an error. I also setup a completely new Nextcloud 16 instance to verify it's not a problem with my server, but a general problem (so you'll probably have the same issue).

Steps to Reproduce

Try to add the following URL to Nextcloud News: https://kali.org/feed/
Get Error :(

FYI: I can download the feed using, e.g. wget or curl.

System Information

News app version: 3.0.0
Nextcloud version: 16.0.1.1
Cron type: System cron
PHP version: 7.3.6
Database and version: PostgreSQL 11.4
Browser and version: Firefrox 67.0.4
Distribution and version: Archlinux on the client, Ubuntu 18.04 LTS on the server

Error message from Nextcloud News App

FeedIo\Adapter\ServerErrorException: Client error: `GET https://www.kali.org/feed/` resulted in a `403 Forbidden` response:
<!DOCTYPE html>
<html lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="stylesheet" href="https://cdn.su (truncated...)
 in /var/www/html/custom_apps/news/lib/Fetcher/Client/FeedIoClient.php:59
Stack trace:
#0 /var/www/html/custom_apps/news/vendor/debril/feed-io/src/FeedIo/Reader.php(116): OCA\News\Fetcher\Client\FeedIoClient->getResponse('https://www.kal...', Object(DateTime))
#1 /var/www/html/custom_apps/news/vendor/debril/feed-io/src/FeedIo/FeedIo.php(286): FeedIo\Reader->read('https://www.kal...', Object(FeedIo\Feed), Object(DateTime))
#2 /var/www/html/custom_apps/news/lib/Fetcher/FeedFetcher.php(77): FeedIo\FeedIo->read('https://www.kal...')
#3 /var/www/html/custom_apps/news/lib/Fetcher/Fetcher.php(64): OCA\News\Fetcher\FeedFetcher->fetch('https://www.kal...', true, NULL, NULL, NULL)
#4 /var/www/html/custom_apps/news/lib/Service/FeedService.php(116): OCA\News\Fetcher\Fetcher->fetch('https://www.kal...', true, NULL, NULL, NULL)
#5 /var/www/html/custom_apps/news/lib/Controller/FeedController.php(152): OCA\News\Service\FeedService->create('https://www.kal...', 0, 'KopfKrieg', NULL, NULL, NULL)
#6 /var/www/html/lib/private/AppFramework/Http/Dispatcher.php(166): OCA\News\Controller\FeedController->create('https://www.kal...', 0, NULL, NULL, NULL)
#7 /var/www/html/lib/private/AppFramework/Http/Dispatcher.php(99): OC\AppFramework\Http\Dispatcher->executeController(Object(OCA\News\Controller\FeedController), 'create')
#8 /var/www/html/lib/private/AppFramework/App.php(126): OC\AppFramework\Http\Dispatcher->dispatch(Object(OCA\News\Controller\FeedController), 'create')
#9 /var/www/html/lib/private/AppFramework/Routing/RouteActionHandler.php(47): OC\AppFramework\App::main('OCA\\News\\Contro...', 'create', Object(OC\AppFramework\DependencyInjection\DIContainer), Array)
#10 [internal function]: OC\AppFramework\Routing\RouteActionHandler->__invoke(Array)
#11 /var/www/html/lib/private/Route/Router.php(297): call_user_func(Object(OC\AppFramework\Routing\RouteActionHandler), Array)
#12 /var/www/html/lib/base.php(975): OC\Route\Router->match('/apps/news/feed...')
#13 /var/www/html/index.php(42): OC::handleRequest()
#14 {main}

SMillerDev commented 5 years ago

It seems to be blocking nextcloud news. Nothing we can do about that unfortunately, you could contact them and ask why though.

KopfKrieg commented 5 years ago

Hm, that's a bit unexpected, but I'll contact them and ask for details. I mean I've seen that I'm able to access the feed through wget/curl, but it didn't even cross my mind that someone would intentionally block Nextcloud News.

digip commented 5 years ago

Hm, that's a bit unexpected, but I'll contact them and ask for details. I mean I've seen that I'm able to access the feed through wget/curl, but it didn't even cross my mind that someone would intentionally block Nextcloud News.

From looking at your code, I think more than likely, it's because there is no user-agent set. Could you confirm what user-agent is set or try using a generic one and let us know afterwards? It's not so much we're intending to block your app, as it may just be some rules in place to drop it because of no agent being set, but that's my hunch.

SMillerDev commented 5 years ago

https://github.com/nextcloud/news/blob/master/lib/Config/FetcherConfig.php#L58 is the user agent.

digip commented 5 years ago

I get a 200 OK with that user-agent. Only other thing I could check is if you can give us the IP/Subnet it was blocked on and have the team look to see if it is in fact blocked by source IP. (Can private message me on twitter if you don't want to make that public - @xxdigipxx ) My initial suspicion on user-agent being the issue was the info at the top of the page for that file. See here:

From what I can see in testing, we're not blocking by that agent. Possible the person testing used a version that had no agent set?

If you can capture the full request and show the headers sent and the 403, might be a clue in the request what it's being flagged and blocked for, but I can't at this point say we're blocking the app itself on user-agent alone, must be some other factor involved, ie: Subnet block lists, version user was using has no user-agent specified, or another part of the app's request I can't see.

When I use a blank user-agent it does give me a 403 forbidden (and is expected), which I still think is the original issue, but without seeing the request headers from the user for the blocked session, hard to determine what the cause is at this point. My suspicion is there is no user-agent being set in the request sent from the specific users version being used.

SMillerDev commented 5 years ago

Thanks for the thorough investigation @digip, much appreciated. The issue for that commit was actually FeedIO setting it's own UserAgent (https://github.com/alexdebril/feed-io/blob/master/src/FeedIo/Adapter/Guzzle/Client.php#L31).

digip commented 5 years ago

Yeah, not blocking that user-agent either, and wouldn't make sense to block a normal browser. I know orgs block for certain tools like httrack and such, but I can say we're not blocking on the apps user-agent alone and some other factor we can't see would be the cause for concern, ie: bad subnet or no agent being set in the version the client KopfKrieg had issues with.

KopfKrieg commented 5 years ago

Hi, and thanks for joining the discussion.

The IP of my server is 194.55.15.115, the IP of my homeserver always starts with 88.133.xxx.xxx. I can confirm that Nextcloud actually uses an user-agent, here's a access log showing it (from my own Nextcloud instance to my own blog):

caddy    | 88.133.0.0 - - [26/Jun/2019:22:51:21 +0000] "GET /feed.atom HTTP/1.1" 200 64544 "-" "NextCloud-News/1.0"

I still can't access the kali.org feed (with the same error message as shown in the first post)

digip commented 5 years ago

Forwarding the subnets to the team to investigate if there is an IP or country block in effect for those addresses. I do know /feed.atom will return a 404, so accessing at /feed, /rss, or /atom would be best and will redirect as needed, but a 404 is certainly not a 403 forbidden. Will update when I have more info from the team. Thanks.

digip commented 5 years ago

Hi, and thanks for joining the discussion.

The IP of my server is 194.55.15.115, the IP of my homeserver always starts with 88.133.xxx.xxx. I can confirm that Nextcloud actually uses an user-agent, here's a access log showing it (from my own Nextcloud instance to my own blog):
caddy    | 88.133.0.0 - - [26/Jun/2019:22:51:21 +0000] "GET /feed.atom HTTP/1.1" 200 64544 "-" "NextCloud-News/1.0"
I still can't access the kali.org feed (with the same error message as shown in the first post)

We're not seeing anything being blocked for the IP subnets you listed anywhere that we would have rules in place for this. That's not to say the system didn't flag and block something, but we don't seem to have those addresses on the logs anywhere. Possible you can try this again and do it a few times to see if we can locate any blocked messages on our end? Also, possible you can try to proxy the connection to see if it still gets blocked when accessing from another IP? As of now, we don't see it on the logs or in any rule sets.

KopfKrieg commented 5 years ago

I can only say it's not IP related because I can access the feed if I'm using, e.g., curl or wget. Unfortunately I don't know an easy/fast way to proxy the server's connection, so I can't test it if a proxied connection would work.

digip commented 5 years ago

I can only say it's not IP related because I can access the feed if I'm using, e.g., curl or wget. Unfortunately I don't know an easy/fast way to proxy the server's connection, so I can't test it if a proxied connection would work.

Can you do a tcpdump or capture the full exchange of headers within your web server(starting the request with http instead of https)? Also, I see at the top a truncated part of the error page "link rel="stylesheet" href="https://cdn.su". This page will have some info that may help us track down the error, and will have a time stamp and ID related to the incident along with the IP in question(if it changes after leaving the edge of your hosts network). The only issue is the firewall logs are something outside our network and only kept a few hours so we have a small time frame to catch them before they get rolled over due to high amount of traffic. Would seem this never reaches the actual Kali.org web server logs because of the firewall blocking it before we can see it, but we're not intentionally blocking the IP range or user-agent in question that you provided us on either the web server or the firewall, so some other part of the request is triggering the blocking of the connection. It may be blocking the same IP, but the rules it's blocking for do not seem to be for the IP itself, and some other part of the request is triggering the ban hammer, if that makes sense.

This may be something between your web server and it's host networks edge. If you can run a local web server/VM with the NextCloud app, I'd be curious to see a wirehsark or tcp dump of the request, which you can start with an HTTP vs HTTPS request, to see what headers get sent before it redirects to https. That may show us what is part of the request being blocked before it does the 301 to HTTPS. Not sure how else to capture that exchange short of you tracing it on your side of the script to log the full exchange.

Sorry I can't be of more help, but if we can see the whole handshake exchange of the GET request to see what may be causing the problem, that may make things more obvious to the issue. Would hate to leave it in a state of "Kali blocks our app" without knowing why, as this may have implications for others we're not aware of with similar issues.

r3pek commented 5 years ago

I have the same problem as you guys, but accessing a different feed (https://www.blackhillsinfosec.com/feed/)

btw, the truncated link is cdn.sucuri.net, but I really don't know where that comes from since this link should only reply with an RSS XML?

KopfKrieg commented 5 years ago

@r3pek Your feed also doesn't work for me. Seems like the problem is not only limited to kali.org.

@digip

Can you do a tcpdump or capture the full exchange of headers within your web server(starting the request with http instead of https)?

Not really, sorry. Right now I lack the time to look into it (and also don't really know how to capture the traffic. The last time I've used Wireshark or something similar was years ago).

r3pek commented 5 years ago

@digip

GET /feed/ HTTP/1.1
Host: www.blackhillsinfosec.com
User-Agent: NextCloud-News/1.0
If-Modified-Since: Thu, 01 Jan 1970 00:00:00 +0000

HTTP/1.1 301 Moved Permanently
Server: Sucuri/Cloudproxy
Date: Thu, 04 Jul 2019 19:02:29 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
X-Sucuri-ID: 15018
Location: https://www.blackhillsinfosec.com/feed/

<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>

I suppose the important part here is Server: Sucuri/Cloudproxy. Maybe this guys are blocking unknown UAs?

digip commented 5 years ago

@digip

GET /feed/ HTTP/1.1
Host: www.blackhillsinfosec.com
User-Agent: NextCloud-News/1.0
If-Modified-Since: Thu, 01 Jan 1970 00:00:00 +0000

HTTP/1.1 301 Moved Permanently
Server: Sucuri/Cloudproxy
Date: Thu, 04 Jul 2019 19:02:29 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
X-Sucuri-ID: 15018
Location: https://www.blackhillsinfosec.com/feed/

<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>

I suppose the important part here is Server: Sucuri/Cloudproxy. Maybe this guys are blocking unknown UAs?

The UA is listed, and no, at least not on kali.org is it being blocked by UA alone, but may be on Sucuri side they are blocking it without an option for us to see. We couldn't find it listed as being blocked and I tested using that agent manually without being blocked. Getting the full disclosed page after that 301 above, more than likely moves it to the https and then blocks, but seems you were not blocked from viewing it on that alone. If the news app can't follow a 301, that may be part of the problem though.

DakotaNelson commented 5 years ago

@r3pek are you coming from 72.14.x.x? If so, it looks like your feed reader triggered some DDoS protections in place for blackhillsinfosec.com and was blocked as a result.

But yes, you'll also want to request the https:// version of the site to avoid that 301 so we can get a better sense of the actual response following the redirection.

r3pek commented 5 years ago

@r3pek are you coming from 72.14.x.x? If so, it looks like your feed reader triggered some DDoS protections in place for blackhillsinfosec.com and was blocked as a result.

I'm on 74.48.x.x or 138.201.x.x

But yes, you'll also want to request the https:// version of the site to avoid that 301 so we can get a better sense of the actual response following the redirection. The http was just a test ;) It's working now. Thanks guys [[]]

@KopfKrieg @digip I talked to the BlackHillsInfoSec guys (where I suppose @DakotaNelson is from), and they fixed the problem ;) Looks like it needs to be done on a per-site basis for sites that use Sucuri.

digip commented 5 years ago

Yeah, the only reason I wanted it in HTTP first, was so we could see the full GET request from a packet capture, in case there were clues like an empty user-agent or extraneous info in the request it might get flagged for. With HTTPS you can't see the request in a packet dump without de-cloaking the SSL side of it.

DakotaNelson commented 5 years ago

Sweet, good to hear it's working.

The tl;dr is that we had to do some whitelisting on our side so that certain feed readers won't trigger our DDoS protections.

KopfKrieg commented 5 years ago

Unfortunately it's not working here :/

I'm not really sure how to help, but this is the output from curl:

$ curl -I http://kali.org/feed/
HTTP/1.1 301 Moved Permanently
Server: Sucuri/Cloudproxy
Date: Wed, 10 Jul 2019 09:55:48 GMT
Content-Type: application/rss+xml; charset=UTF-8
Content-Length: 0
Connection: keep-alive
X-Sucuri-ID: 15010
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
Strict-Transport-Security: max-age=15552000
Last-Modified: Mon, 08 Jul 2019 23:17:03 GMT
ETag: "11d819623382ee2b20493243614f0006"
X-Redirect-By: WordPress
Location: http://www.kali.org/feed/
X-Frame-Options: sameorigin
X-Sucuri-Cache: HIT

$ curl -I https://kali.org/feed/
HTTP/2 301 
server: nginx
date: Wed, 10 Jul 2019 09:55:51 GMT
content-type: application/rss+xml; charset=UTF-8
content-length: 0
location: https://www.kali.org/feed/
x-sucuri-id: 15010
x-xss-protection: 1; mode=block
x-frame-options: SAMEORIGIN
x-content-type-options: nosniff
strict-transport-security: max-age=31536000
content-security-policy: upgrade-insecure-requests;
strict-transport-security: max-age=15552000
last-modified: Mon, 08 Jul 2019 23:17:03 GMT
etag: "11d819623382ee2b20493243614f0006"
x-redirect-by: WordPress
x-frame-options: sameorigin
x-sucuri-cache: HIT

$ curl https://kali.org/feed/
$

What I don't get: Shouldn't the last command show some html/xml output?!

r3pek commented 5 years ago

@KopfKrieg The "solution" is only applicable to the BlackHillsInfoSec. The "fix" has to be done on a per-site bases, so the kali.org admins need to fix it themselfs :(

KopfKrieg commented 5 years ago

Ah, sorry, I misinterpreted the previous post. But now ít's clear.

digip commented 5 years ago

@KopfKrieg The "solution" is only applicable to the BlackHillsInfoSec. The "fix" has to be done on a per-site bases, so the kali.org admins need to fix it themselfs :(

I think this is the "wrong" way to look at things, and also, possibly the wrong solution. If BlackHills identified it on Sucuri blocking as a DDoS or other such, then the app is doing something we're not seeing that triggers the response. Fixing the request is key here, not dropping firewall rules from doing their job, even if this seems like a false positive.

One thing to note, is that Kali.org also uses HSTS, so this is a strict transport setup for the domain. I see "HTTP/2 301" in the one curl request, and it is not a 403 forbidden. However, the redirect to where after that, is what I'd like to see if you have it @KopfKrieg which if blocked, should start with a 403 not a 301. It's quite possible the news client is not compatible with HTTP/2 and can't handle the redirect to binary frames, which should allow the client and server to handle the framing transport which in this case is a new protocol and a binary layer the app may not be able to handle and respond to correctly.

Starting to become a bit clearer as well the more data we have on what is happening, Seeing it start in http, and then redirected to https with the HTTP/2 portion, seems this is where things start to go down hill. If the news app is trying over and over again or stuck in a loop to get the same request but not seeing what it thinks it needs, it could be what is causing the trigger for the DDoS block or other such protections. From our standpoint, the firewall is doing what it's supposed to do, if we are in fact blocking it based on some trigger like BlackHills discovered.

To note, we're not blocking based on user-agent or IP ranges, it's definitely something between the app itself and the server's firewall and possibly due to the HTTP/2 part. I'd be curious for you to try finding other HTTP/2 based RSS feeds/servers and test there to see what happens. My suspicion is it's going to fail, but just my hunch.

Update: Also, I don't see HTTP/2 when I make a curl request, unless the version of curl you're using is defaulting to HTTP/2. Our site supports the protocol, but the client should be what sets this in the request so probably not the issue. Still curious if something in feed-io is causing this though.

r3pek commented 5 years ago

@digip OK, I do understand what you're saying, but adding a feed to an RSS aggregator is certainly not DDoS, so, I still consider this a false positive. Anyway, it would be nice to understand what actions exactly are triggering the DDoS protection in Sucuri (and I also talked to them but they just pointed me to the site admins saying they couldn't fix anything themselfs).

curl always works via http/2 or http(s) 1.1. I never got a 403 denied when using curl. But to test this in nextcloud, I would need to MitM my server to be able to sniff the https traffic, which seems overkill since you may not even be able to understand what caused the deny in the first place and you will only see a request to a url and a 403 (with no reason given).

Our best bet in this case, is to see what sucuri firewall itself says about the event and try to understand why that happened. I'm almost sure that the BlackHills guys didn't change the rules just for a rss reader, but that they actually saw the "event" as a false positive, and made some changes that fixed the problem for "us" and it wouldn't affect other "real-world-traffic-that-could-cause-a-ddos".

digip commented 5 years ago

@digip OK, I do understand what you're saying, but adding a feed to an RSS aggregator is certainly not DDoS, so, I still consider this a false positive. Anyway, it would be nice to understand what actions exactly are triggering the DDoS protection in Sucuri (and I also talked to them but they just pointed me to the site admins saying they couldn't fix anything themselfs).

curl always works via http/2 or http(s) 1.1. I never got a 403 denied when using curl. But to test this in nextcloud, I would need to MitM my server to be able to sniff the https traffic, which seems overkill since you may not even be able to understand what caused the deny in the first place and you will only see a request to a url and a 403 (with no reason given).

Our best bet in this case, is to see what sucuri firewall itself says about the event and try to understand why that happened. I'm almost sure that the BlackHills guys didn't change the rules just for a rss reader, but that they actually saw the "event" as a false positive, and made some changes that fixed the problem for "us" and it wouldn't affect other "real-world-traffic-that-could-cause-a-ddos".

Yeah, I'm still trying to figure out what in the process triggers the block, as myself I have tried multiple clients using the nextcloud user agent, from browser agent switchers, curl and wget, all of which seem to work fine for me. My suspicion is something in feed-io handling, but this is a guess at best.

digip commented 5 years ago

So I'm trying to understand how this app works, as I'm not familiar with it, or how it installs and what/where to go to make it pull a feed the way you are using it.

Looking at your documentation, I went to https://apps.nextcloud.com/apps/news and installed the 12.0.4 version, extracted the /news directory, chown to www-data, then proceeded to cd into /news/vendor/bin and ran the picofeed program to test. This required me to fix the path, but once I did that, I was able to pull the feed from our site. I know that's probably not how you are using this, but I'm not sure how the app works to be honest, and where/how you get the info at the top of the post.

I tried running the latest and even git clone and composer installed, but the "feedio" script in /bin for the 14.x.x version did not seem to run and complained about the guzzle client. I did update to php 7.1 as well prior to downloading any of the files since I saw a message about moving to 7.1 on one of the doc pages.

If anyone has some tips how I can test locally in my VM, I'd appreciate a quick walk-through of the setup process and steps you use to reproduce the issue, so I can see on my end what happens and where.

`root@lamp .../vendor/bin# ./picofeed https://www.kali.org/feed/ Usage: ./picofeed feed ./picofeed debug ./picofeed item ./picofeed nofilter ./picofeed grabber ./picofeed favicon root@lamp .../vendor/bin# ./picofeed feed https://www.kali.org/feed/ Feed::id = https://www.kali.org/feed/ Feed::title = Kali Linux Feed::feedUrl = https://www.kali.org/feed/ Feed::siteUrl = https://www.kali.org/ Feed::language = en-US Feed::description = Penetration Testing and Ethical Hacking Linux Distribution Feed::logo = Feed::date = Mon, 08 Jul 19 23:17:03 +0000 Feed::isRTL() = false Feed::items = 10 items

Item::id = 5cb1ec5c811a97dc6b92ccaa3bb4335ae35086d88a96ff8afd17fae60b1959b7 Item::title = Raspberry Pi 4 and Kali Item::url = https://www.kali.org/news/raspberry-pi-4-and-kali/ Item::language = en-US Item::author = elwood Item::enclosureUrl = Item::enclosureType = Item::date = Fri, 05 Jul 19 15:15:46 +0000 Item::publishedDate = Fri, 05 Jul 19 15:15:46 +0000 Item::updatedDate = Fri, 05 Jul 19 15:15:46 +0000 Item::isRTL() = false Item::categories = [Kali Linux News,ARM image,raspberry pi] Item::content = 2327 bytes ----`

digip commented 5 years ago

@KopfKrieg can you try a few things for me? First with the 16.x version as it is now, and then with NextCloud 13 version 12.0.4(downgrade the server)? https://github.com/nextcloud/news/releases/download/12.0.4/news.tar.gz and see if that works?

Curious if the same issue happens with the older version. You mentioned using FF as the client as well? Not sure where in FF you add the feed or where you find an add-on for NextCloud, only thing I could find was [ https://addons.mozilla.org/en-US/firefox/addon/nextcloud-passwords/?src=search ], but can you try "ctrl+shift+i" and click on Network, then try requesting the feed with each version of the server running? This will show the URLs and the full request can be seen if you click on the item.

Example:

I don't know if it will gleen any new information, but curious of the older version works vs the latest.

r3pek commented 5 years ago

@digip it doesn't, i tried that ;) it's the actual server that makes the request to the server feed, not the client (browser)

digip commented 5 years ago

@digip it doesn't, i tried that ;) it's the actual server that makes the request to the server feed, not the client (browser)

You tried version 12.0.4 of the server?

r3pek commented 5 years ago

@digip it doesn't, i tried that ;) it's the actual server that makes the request to the server feed, not the client (browser)

You tried version 12.0.4 of the server?

Nop, just the 16.x

digip commented 5 years ago

@digip it doesn't, i tried that ;) it's the actual server that makes the request to the server feed, not the client (browser)

You tried version 12.0.4 of the server?

Nop, just the 16.x

Would you be able to test the 12.0.4 version? Seems between 12.0.4 and later there may be some differences, just trying to see if the older server version exhibits the same behavior in being blocked.

digip commented 5 years ago

tl;dr - if you really want our feeds added in nextcloud/news without waiting for a fix, you can add them right now using feedburner as a temporary(or long term) solution.

Been doing some testing on our end, I have it installed in a local VM and working now(at least, working as in installed and able to pull other feeds, just not our own - unless externally proxied).

I feel your pain, but I also have an alternative, in that using something like feedburner as a proxy to create a URL for the feed works without issue in the nextcloud/news app. We're going to be testing some things this morning to see what we can do to prevent the 403 errors, but as it's looking now, something on the news app side may not be handing the request properly or accepting the cached gzip'ed copy of the feed we reply with. Since it's sending back a binary file it seems this may be where things go down in the handling of the cached RSS file. Testing internally and bypassing the cache, I can get nextcloud/news to pull it with no issue, unsurprisingly, but externally from our proxy cache, it fails every time, so we're going to see if that's something we'll be able to change, or if the alternative is just having an external feed proxy such as feedburner, which I know is less than desirable, put in place, as we may not be able to change cache policy specifically for the app to bypass it.

It also fails on the offsec site's feed, so this is something I personally would like to see us resolve as it impacts the community at large, and possibly other tools & news aggregators may have similar issues if using the same libraries for their tools like feed-io or such.

Will let you know more after testing today, but wanted to offer the tidbit about feedburner.

digip commented 5 years ago

Some news and food for thought. Install the linux tool GET which works similar to curl, but allows you to use headers as you add them. Make a request for our feed, as so:

GET -s -U -H "Host: www.kali.org" -H "User-Agent: NextCloud-News/1.0" http://www.kali.org/feed/

And you will see right away it fails. Now use:

GET -s -U -H "Host: www.kali.org" -H "User-Agent: NextCloud-News/1.0" -H "Accept:*/*" http://www.kali.org/feed/

with the accept header as shown above and you will see it spit out the gzipped binary of our rss feed. So on it's header request, the nexcloud/news app sends NO "Accept:" header in the request, and this is why it's being blocked, as it's not handling the request the proxy is looking for, just like when I thought initially it had no user-agent, which would also block it for not having one.

Adding an accept header and what you require should fix this in your app, so long as you handle the rest of the request properly for the output sent back to the app. This should fix it by patching it to add that accept header with the request.

Would appreciate if someone could test for me, as I wasn't sure where in the app this should be added to make that happen.

Note: some systems will have this tool as "/usr/bin/lwp-request" from liblwp-protocol-https-perl

r3pek commented 5 years ago

thx @digip for the debug of the issue. I created a pull request after testing your theory that should fix the issue ;) Let's all wait it get's merged now.

digip commented 5 years ago

So good news and bad news. I added your changes. No 403 forbidden, however, I don't seem to see the feed being added in the news app, just an hourglass. Not sure if the app doesn't know what to do with it, or I have something misconfigured, but no more 403.

WireShark however shows it being sent back, and I can see that it's now getting an 200 ok, but doesn't seem to be loading the actual feed in my browser. Not sure what else to try at the moment, but wanted to give a heads up, this change alone does not seem like it is enough to satisfy the apps needs.

Would be happy to make more changes in my VM to test, if you have any ideas what I should try.

digip commented 5 years ago

Another update, on a plus side, the offsec feed seems to load with no issues now. This is now getting past the firewall and possibly the Kali apache settings coming into play as we see duplicate fields in header responses on the kali site and admins are looking into it, so more digging, but def helped with the accept headers as I can pull the offsec feed no without issue now. Will know more Tuesday on the kali site.

Edit: Would appear the fix above is fine, and this time it's actually the kali web server settings that will need to be updated. Seems we may be enforcing gzip only for files we serve from the cache, vs using a vary user-agent to allow for different formats to be sent back, ie: plain text, xml, vs only a gzip of any file or page we cache in our proxy.

Would be interested to also see if BlackHills security undoes the whitelisting and can test with the new patch from https://github.com/r3pek/news/commit/70d94ee25c0ae23e7c198095cd944e6fd6d75c9d which I suspect will fix for them as well without having to make special rules to work around the Sucuri firewall rules.

nextcloud / news