octobercms / october

Self-hosted CMS platform based on the Laravel PHP Framework.
https://octobercms.com/
Other
11.01k stars 2.21k forks source link

Theoretical implementations of Web Application Firewalls within October #3939

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hi,

If it's ok I would like to open this issue up and gather ideas about this subject and input from other users.

I am thinking of creating a Web Application Firewall (WAF) for October CMS to add a layer 7 defense (in the OSI model), see below:

1

I have looked at Laravel Firewall 2.2 Package found here: https://github.com/antonioribeiro/firewall

The two issues I have with this, is that it is storing all the IP addresses of every User and that is against GDPR rules and gives you an added risk if your database gets hacked. It also tries to store huge amounts of data in your database which is a complete waste of space! [1]

My idea I have come up with so far is to check every route a User or Bot tries to access and then compare it to the list of the routes in your database. If the route doesn't exist in your database then send the IP address of the User or Bot and/or the User Agent to your Google Analytics account, e.g.

IP Address Route User Agents
123.123.123.123 http://www.example<script .com>alert(document.location)</script Accoona-AI-Agent/1.1.2 (aicrawler at accoonabot dot com)

The results could then be sent from Google Analytics to the Dashboard in October's Backend or to a Separate web page showing the results in a list. The webmaster could then click on each item in the list to blacklist them by IP Address or User-Agent.

Note: I would have to setup a Proxy Server to send the data to GA, thus avoiding the data getting blocked from things like Ad-Blockers etc.

[1] By doing this we could use Google Analytics cloud storage to save the data instead of saving the data to your database (You could use the DOM to display the data from Google Analytics). The only data that would be saved is the blacklisted (selected) items by the webmaster.

Their could also be a whitelist IP range for the backend URL (or any other specific URL location), for example a client website has all the admins in Norway, the webmaster could block all countries trying to access the Backend admincp URL (except for the Norway IP Range). Then say a person in China could not access the backend at all.

There could also be a blacklist database added of known bad User-Agents, e.g. https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/blob/master/_generator_lists/bad-user-agents.list

GDPR Security Issues and loopholes

It is against the GDPR law to save a Users IP address without telling them and this creates a security issue! But if the CMS stores only bad IP addresses and/or User-Agents (due to security protection), then there is a reason to be allowed to do that. In this case the reason is that why is the User or Bot trying to access a route that is not stored in the database?

I welcome ideas and input into whether it is a good idea to create a WAF for October CMS to try and harden it.

p.s. I'm focused on this direction and not all the extra security HTTP headers and firewalls in .HTaccess that side of things I have already got sorted out.

LukeTowers commented 5 years ago

Storing IP addresses in your database is not against the GDPR, in this case it is covered as a "Legitimate Interest" as no matter how you look at it a software WAF is going to require some amount of IP address logging to be effective (as you allude to by suggesting you store it in Google Analytics). As a side note, the package you linked doesn't associate IP addresses to users, it merely blocks them as requested / under suspicious circumstances as necessary. If anything, passing that data along to Google Analytics is worse for GDPR compliance as Google has much more data to connect those IP addresses to actual people in real life therefore leaking much more data about your users that a application-level firewall storing IPs in a database would.

If you're concerned about the amount of storage required, first of all, have you actually calculated how much it would use? I would suggest that it's not really that much in the grand scheme of things; and secondly, you'd always have the option of using an entirely separate datastore for the firewall as the rest of your application itself through the use of the DB connection property on whatever model class such a package would utilize.

In regards to the non-existent routes checking, that doesn't actually protect you that much. There are plenty of reasons that a legitimate user could wind up on a route that your database doesn't have (hence why 404 pages exist), and there are plenty of attacks that can occur on legitimate routes therefor bypassing your entire firewall system as proposed.

Long and short of it, if I were looking at implementing a WAF in October for some reason, I'd probably just build it off of the linked package. It's almost always better to go with someone else's open source widely used solution instead of rolling your own, especially when it comes to security related functionality.

Those are my thoughts on the matter anyways.

LukeTowers commented 5 years ago

As another note, I don't typically mind these discussion type threads, but I'm not sure issues on the main octobercms/october repo is really the right place for them. I would recommend using the forums, but unfortunately they're read only because of spam right now. Do you have any thoughts on better places to have these discussions?

ghost commented 5 years ago

@LukeTowers Thanks Luke for your input and all your points, I totally agree on all of them and will keep researching this matter more.

With regards to discussion type threads, I was in two minds about writing this issue here. I wanted a place where I could have this stored and let users add input over time. I don't like the idea of writing this type of post in the chat room and having any important points lost.

It's annoying that the forum is read only and I think the October community need to have a proper place to discussion important things. The problems at the moment is that October has many places opened up right now, which is a problem meaning we need to open many accounts to all these places to be in the loop. So I would actually close half of them down! Make a single chat room and make a single forum!

With regards to Spam, why not add some machine learning spam API to clean up the forum! Two API's that just into mind are here:

https://www.perspectiveapi.com/#/ https://cloud.google.com/blog/products/gcp/filtering-inappropriate-content-with-the-cloud-vision-api

Also the forum should have an app/mobile version.

With regards to the chat room everyone uses Telegram right now, but it is not secure and people should use Signal instead as it has end to end encryption and allows chat rooms.

LukeTowers commented 5 years ago

I agree the forum needs to be reopened, but that's @daftspunk's wheelhouse to deal with as I don't have access to the website at all. As far as chat rooms go, I don't think we'll be moving away from Slack any time soon, simply too many people use it daily for us to force a migration unless we're really solid on an alternative.

ghost commented 5 years ago

Going to close this issue due to two main reasons:

  1. This is not a bug/issue with regards to October CMS.
  2. I don't think this would end up being an enhancement, it's more likely to become a separate plugin, for everyone to decide if they want it or not.

Continuing this topic in a new repository found here: https://github.com/ayumihamsaki/waf

w20k commented 5 years ago

Hi @ayumihamsaki, just wanted to add my 50 cents.

My idea I have come up with so far is to check every route a User or Bot tries to access and then compare it to the list of the routes in your database. If the route doesn't exist in your database then send the IP > address of the User or Bot and/or the User Agent to your Google Analytics account, e.g. The results could then be sent from Google Analytics to the Dashboard in October's Backend or to a Separate web page showing the results in a list. The webmaster could then click on each item in the list to blacklist them by IP Address or User-Agent.

First, take a look at source code and the plugin: https://octobercms.com/plugin/vdlp-redirect. It's partially doing what you were planning to come up ;) Side Note: For storing IPs @LukeTowers was right. You could do it, but if it's not differentiated as an exact User. And as a GDPR tip, you could store IP's, but not in the plain format - crypted version is okey 😆 .

Note: I would have to setup a Proxy Server to send the data to GA, thus avoiding the data getting blocked from things like Ad-Blockers

What I did in one of my projects to avoid blocking requests from Ad-blocker to GA. Was storing all user data (crumbs) on the front-end (in object/session) and once in 10-20 seconds pushed data to the backend with the plain Ajax request, where backend communicated with AdobeTarget and GA.

LukeTowers commented 5 years ago

@w20k do you have a source for

Side Note: For storing IPs @LukeTowers was right. You could do it, but if it's not differentiated as an exact User. And as a GDPR tip, you could store IP's, but not in the plain format - crypted version is okey

w20k commented 5 years ago

In the context of GDPR, part of securing Personal Data means employing multiple levels of protection to ensure that data is not lost, destroyed, or disclosed to unauthorized individuals. One GDPR principle for securing Personal Data is Pseudonymization, which is defined as "...the processing of personal data in such a way that the data can no longer be attributed to a specific Data Subject without the use of additional information."

The processing of personal data to the extent strictly necessary and proportionate for the purposes of ensuring network and information security, i.e. the ability of a network or an information system to resist, at a given level of confidence, accidental events or unlawful or malicious actions that compromise the availability, authenticity, integrity and confidentiality of stored or transmitted personal data, and the security of the related services offered by, or accessible via, those networks and systems, by public authorities, by computer emergency response teams (CERTs), computer security incident response teams (CSIRTs), by providers of electronic communications networks and services and by providers of security technologies and services, constitutes a legitimate interest of the data controller concerned. This could, for example, include preventing unauthorized access to electronic communications networks and malicious code distribution and stopping ‘denial of service’ attacks and damage to the computer and electronic communication systems.

tools-to-anonymize-or-pseudonymize-data-are-part-of-idcs-gdpr-technology-framework-and-are-among-the-tools-which-are-underused-by-cisos-and-cdos-according-to-idcs-philip-carnelley

My bad, not encrypting, but hashing.

It's not like you can't store personal data, but all the issues that might arise from leaking Personal Data will all be yours. It's like a perfect balance, where if you want to store personal data you must make your server or services more secured.

LukeTowers commented 5 years ago

One GDPR principle for securing Personal Data is Pseudonymization

For this case of storing IP addresses not connected to any user data this would be acceptable under that principal wouldn't it?

w20k commented 5 years ago

Yep, it would be acceptable.

ghost commented 5 years ago

Thanks for input. Will be working on this in a few months time (I have done half the coding now). Just want to finish off some more important things for October first. Just going through my list of Github Open Issues. Will add a link here later for some testers, when I finished coding it.

w20k commented 5 years ago

Will reopen this issue ;)

ghost commented 5 years ago

@w20k I think better to close it. I think the solution is more of a plugin and not a core module to the cms.

w20k commented 5 years ago

@ayumihamsaki it's still like a discussion, not a solution or an issue. Oh, my bad forgot you've added a link to the repo, where this discussion could move on 😄