Open vince-weka opened 1 year ago
Not sure if we're talking about a software firewall on the host or not. If we are, then that definitely can be a serious problem:
If we're talking about hardware firewalls outside of the system, or on interfaces other than the ones we are using for Weka, then that's of course fine. (Modulo any external firewall has to be properly configured to allow Weka traffic to transmit unimpeded between cluster nodes.)
Confirmed that we are talking about iptables (based on code in this repo). The above considerations apply. iptables must not be in use on any interface under Weka control.
It's a pretty massive security hole to not have any firewall at all. Are you saying that there is a big performance impact if the firewall is running, but the weka ports are open? (it does work, by the way)
If you have the weka ports open, then it will be fine -- the firewall will operate on the inside of the DPDK instances. It will have a bad impact on UDP mode sockets. Note that what you cannot do is build rules that permit weka ports only based on certain IP masks, etc. -- because the DPDK gets packets that the firewall will simply not see.
This is also the situation for interfaces that are not using single IP (for example cloud instances) -- the DPDK interfaces won't use iptables at all. Having said that, generally the DPDK (and hardware as well) will drop packets that don't seem like they should be delivered. Use of E810 single IP as currently designed is also likely problematic with iptables.
Given all this, the benefit you're receiving with an in-kernel firewall is IMO extremely limited.
Best practice IMO is to rely on firewalls external to the host. I don't believe it is a great idea to put a weka node out on the hostile internet.
In fact, as I think about it, if folks are using firewalls on the host thinking they are getting some security benefit, we should probably warn -- because honestly DPDK mode sockets are not going to see a benefit (they simply bypass the firewall), and whatever security benefits the customer thinks they are getting here are probably not accurate.
Combine that with the problem of correctly configuring a firewall in the first place, and I think we're setting ourselves up for a nightmare scenario with customers. (Why doesn't this work right? Why is my cluster slow? Etc.)
The firewall's purpose is to block incoming traffic to ports/services that we don't need in order to run. The configuration I've been using opens only ports 22 and 14000-15000; all others are blocked. Weka works fine with this configuration.
Also, most cyber attacks and hacks come from within the corporate network. Proxies and hardware firewalls in routers and such protect against internet hacks, but leaving your storage system wide open to internal attacks might not be a good idea.
It's not a hard failure to have a firewall, but it should WARN that a firewall is present and requires a human to verify it's correct.