pitkley / dfw

Docker Firewall Framework, written in Rust
Apache License 2.0
93 stars 9 forks source link

World to host not working (& host to container) #705

Open MichaelVoelkel opened 10 months ago

MichaelVoelkel commented 10 months ago

Hi,

being on Debian12, I switched to NFT now again. The basic configuration is just from the docs, /etc/nftables.conf:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        tcp dport 22 accept
    }
    chain forward {
        type filter hook forward priority 0; policy drop;
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

/etc/nftables exists but I don't use it. I hope it makes no trouble.

I can connect to ssh and nothing else works, so far so good.

Now I have portainer running on 9443 where I could before have world access and host access (it has the ports mapping of 9443:9443, so I should access it via localhost, e.g., nc -vv localhost 9443). As connection times out both ways, I'd assume that the package is dropped.

My rules.toml is: (yeah, small, nothing else)

[[wider_world_to_container.rules]]
network = "portainer_network"
expose_port = 9443
dst_container = "portainer"

I run dfw currently like this to see the logs:

docker run --rm       --name=dfw       -v /var/run/docker.sock:/var/run/docker.sock:ro       -v $PWD/rules.toml:/config/dfw.toml       --net host --cap-add=NET_ADMIN       pitkley/dfw:1.2.1 --log-level trace --config-path /config  

And yeah, stuff like nc etc. I do via a second ssh session, so I keep it open. :)

So, my problem clearly is that I cannot connect but would hope/expect to do so.

Some more information / pecularities / questions / comments:

Probably I messed up something very basic. I tried also to make sure that old iptables is disabled but sudo systemctl disable iptables told me it did not even know iptables.

Hm, when I reset nft rules to /etc/nftables.conf, it's empty though, when I start dfw, it fills up to the config shown above, so it seems to do something.

MichaelVoelkel commented 10 months ago

Hm, having set the device explicitly to eth0, I see now some rules (yeah, I tried other containers, too, but nothing works) and nftables seems to reflect it, but no change in behaviour. Apart from that I see that journalctl, despite logs, does not show the incoming 9443 packets anymore.

table inet filter {
    chain input {
        type filter hook input priority filter; policy drop;
        tcp dport 22 accept
        log
        log
    }

    chain forward {
        type filter hook forward priority filter; policy drop;
    }

    chain output {
        type filter hook output priority filter; policy accept;
    }
}
table inet dfw {
    chain input {
        type filter hook input priority filter - 5; policy accept;
        ct state invalid drop
        ct state { established, related } accept
        iifname "docker0" meta mark set 0x000000df accept
    }

    chain forward {
        type filter hook forward priority filter - 5; policy accept;
        ct state invalid drop
        ct state { established, related } accept
        iifname "docker0" oifname "eth0" meta mark set 0x000000df accept
        tcp dport 9443 ip daddr 172.17.0.2 iifname "eth0" oifname "br-22eb53281a80" meta mark set 0x000000df accept
        tcp dport 8000 ip daddr 172.17.0.2 iifname "eth0" oifname "br-22eb53281a80" meta mark set 0x000000df accept
        tcp dport 9115 ip daddr 172.20.0.3 iifname "eth0" oifname "br-d65ccc79fc1d" meta mark set 0x000000df accept
    }
}
table ip dfw {
    chain prerouting {
        type nat hook prerouting priority dstnat - 5; policy accept;
        tcp dport 9443 iifname "eth0" meta mark set 0x000000df dnat to 172.17.0.2:9443
        tcp dport 8000 iifname "eth0" meta mark set 0x000000df dnat to 172.17.0.2:8000
        tcp dport 9115 iifname "eth0" meta mark set 0x000000df dnat to 172.20.0.3:9115
    }

    chain postrouting {
        type nat hook postrouting priority srcnat - 5; policy accept;
        oifname "eth0" meta mark set 0x000000df masquerade
    }
}
table ip6 dfw {
    chain prerouting {
        type nat hook prerouting priority dstnat - 5; policy accept;
        tcp dport 9443 iifname "eth0" meta mark set 0x000000df
        tcp dport 8000 iifname "eth0" meta mark set 0x000000df
        tcp dport 9115 iifname "eth0" meta mark set 0x000000df
    }

    chain postrouting {
        type nat hook postrouting priority srcnat - 5; policy accept;
        oifname "eth0" meta mark set 0x000000df masquerade
    }
}
MichaelVoelkel commented 10 months ago

Ok, my log was stupid because now I want to log FORWARD of course. And there I see something:

Jan 01 13:25:18 v62887.php-friends.de kernel: IN=eth0 OUT=br-d65ccc79fc1d MAC=<filtered> SRC=<filtered> DST=172.20.0.2 LEN=64 TOS=0x00 PREC=0x00 TTL=50 ID=0 DF PROTO=TCP SPT=54694 DPT=9115 WINDOW=65535 RES=0x00 SYN URGP=0 MARK=0xdf 

Well, this seems fine. The packet is filtered towards the docker container but for some reason nothing happens there hmmm...

OUT is a bit strange though, this is some veth0 interface because this whole thing runs on a KVM virtual machine... (not managed by me but my provider where I buy the hosting solution, was not a problem so far though).

MichaelVoelkel commented 10 months ago

Ok, I needed to also add "backend_defaults"... I somehow thought this would not be needed as it was default anyways.

Also in your docs you describe some sample nftables.conf file... This is a REALLY bad one because it will also not allow pinging out or working with established connections. Maybe replacing it with something with sensible rules would make more sense?

I suggest:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;
        tcp dport 22 accept
    ct state invalid drop
    ct state { established, related } accept
    ip protocol icmp icmp type echo-request accept;
    icmpv6 type echo-request accept;
    }
    chain forward {
        type filter hook forward priority 0; policy drop;
    ct state { established, related } accept
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

although it's certainly incomplete because ping6 does not work yet... anyways

pitkley commented 10 months ago

Hi @MichaelVoelkel, thanks for reaching out. I'll try to go through your various points one-by-one, although some might overlap with others. 🙂


I'm puzzled why I need to state a network. Before I just had "bridge" network, so I also tried putting that as name. Later, I created a new custom bridge network and connected the container to it (which I double-checked via docker inspect) that I called portainer_network and which you see in my rules-file now.

Every Docker container you run has to be attached to some kind of Linux network interface, at least assuming it should be able to connect to the network (which it does unless you specify --network none). Docker, when you run a container without specifying --network, does use the default bridge network it creates for itself when it first starts up.

Given this fact that a Docker container always will be associated with a virtual bridge network interface, for firewalling to work, nftables has to know which network interface packets are destined for or coming from and thus DFW has to know too, for it to be able to create rules with the correct constraints.

I was unsure whether I needed to restart the container first or not, so I did that. I also stopped and started it (which retained the specific network)... nothing helped here.

Unless you run DFW with the --run-once flag (which you haven't according to your first post), DFW will automatically update the nftables ruleset whenever anything surrounding Docker containers changes. So if you start a container after DFW is already running, and a rule you have defined applies to the container, DFW will automatically roll out this new rule.

If you have started your applications before you started DFW, DFW will still automatically apply all relevant rules, because it also applies all rules whenever it starts up.

/etc/nftables exists but I don't use it. I hope it makes no trouble.

I am fairly certain that you are using it, even if you don't think you are: the nftables systemd-service uses this file to apply rules when it launches. You can verify this using this command:

$ cat "$(systemctl show -P FragmentPath nftables.service)" | grep '^Exec'
ExecStart=/usr/sbin/nft -f /etc/nftables.conf
ExecReload=/usr/sbin/nft -f /etc/nftables.conf
ExecStop=/usr/sbin/nft flush ruleset

This means that the systemd-unit nftables.service during start and reload just instructs nftables through the nft command to load the ruleset from the /etc/nftables.conf file. You can verify that the nftables service is used through this command:

$ systemctl show --property ActiveState --property UnitFileState nftables.service
ActiveState=active
UnitFileState=enabled

If the unit is active and enabled, it works as I have described above.

The reason I'm going into so much detail here: my personal suggestion for setting up nftables is to configure your base-rules in /etc/nftables.conf, i.e. primarily rules that are not directly related to the Docker containers you are running, and then have DFW take care of the rest.

Following is the /etc/nftables.conf file that I'm using:

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
    chain input {
        type filter hook input priority 0; policy drop;

        # Allow local traffic
        iif lo accept

        # Allow related traffic (-> stateful connection tracking)
        ct state { established, related } accept

        # Setup ICMP and ICMPv6
        icmp type { echo-request, echo-reply, time-exceeded, parameter-problem, destination-unreachable } accept
        icmpv6 type { echo-request, echo-reply, time-exceeded, parameter-problem, destination-unreachable, packet-too-big, nd-router-advert, nd-router-solicit, nd-neighbor-solicit, nd-neighbor-advert, mld-listener-query } accept

        # Configure SSH
        tcp dport 22 accept

        # reject traffic instead of just dropping it
        reject with icmpx type port-unreachable
    }

    chain forward {
        type filter hook forward priority 0; policy drop;
    }

    chain output {
        type filter hook output priority 0; policy accept;
    }
}

A few things to note in the input hook:

(it has the ports mapping of 9443:9443, so I should access it via localhost, e.g., nc -vv localhost 9443). As connection times out both ways, I'd assume that the package is dropped.

You are correct that the package will be dropped. As shown above I add the iif lo accept rule to enable this kind of traffic to work. I think adding that to the default documentation would likely make sense, because it is very confusing if local traffic doesn't work.

Ok, I needed to also add "backend_defaults"... I somehow thought this would not be needed as it was default anyways.

Do I understand correctly that things work now after you have added backend_defaults, but didn't before?

One thing I did notice in the rulesets you have posted is that DFW does not hook itself into the filter tables, which it will do if the backend_defaults are set up like this:

[backend_defaults]
custom_tables = { name = "filter", chains = ["input", "forward"] }

You can find more details on this field in the documentation here, but the gist of it is this: DFW has to be able to act on traffic when it traverses any one of the input or forward hooks. This can be achieved in one of three ways:

  1. Have no other tables that hook input or forward, leaving only DFW's table.

    This is not really feasible, because that would leave you with an entirely open firewall, at least until DFW has run.

  2. Ensure that any existing tables that hook input or forward don't drop the traffic before it reaches DFW's tables.

    This is not great if you want your input hook to have a drop policy, which I personally would always want, just to make sure I don't accidentally expose any port I didn't intend to expose.

  3. Let DFW know about any existing tables and chains that hook input or forward.

    This is what the custom_tables setting does and it give us the best of both worlds: we can ensure that DFW can correctly accept traffic it is responsible for while still being able to default drop traffic in the input hook.

This is a REALLY bad one because it will also not allow pinging out or working with established connections.

Assuming DFW has run and is instructed to attach to the existing tables it would work: the output hook does not deny the echo-request (ping) and the input hook would allow related packets to let the echo-response (pong) come through. Without it having run though, the default would disallow this from happening, yes.

Regarding established/related packets: I agree, this should be part of the default config. Regarding allowing incoming pings: I don't want to prescribe to a user of DFW whether they want their host to be pingable. I think a good middle ground would be to add it with a comment, i.e. indicating to the user that their host won't be pingable unless they enable that rule.


In summary, the final more-than-minimal configuration that works well for me is this:


Tasks:

MichaelVoelkel commented 10 months ago

Hi! Thanks for your great, long answer. Yeah, now everything works, maybe apart from locally accessing containers, BUT I will try out your iif lo filter because I don't have this one.

And all sounds really interesting what you write. Of course, it's true that nftables is used as base. And yes to:

Do I understand correctly that things work now after you have added backend_defaults, but didn't before?

My default policy clearly is drop. And by the way, as for pings, I was just talking to pings going outside. I agree that inside pings are a different story.

As for the network thingy, I was just thinking, if the docker container only has one network, dfw could theoretically read it from it and use it, as a convenience idea. But yeah, that's not necessarily needed.

All in all, I need to say: your solution is great!!

Getting nftables running with docker is normally not doable nicely... And it should really be a firewall solution that sits in its own docker container to have it maintainable. This seems like the best practice. And your repo offers exactly this solution. So thanks a lot!