alexc20 commented 2 years ago

Kubernetes egress filter design discussion

Motivation

Develop a programmable egress observability/policy enforcement to track, alert, stop backdoors from compromised service communicating to the outer network.

Requirement

We want to be able to write policies for egress traffic. We want to enforce policy not only on headers but also on payloads, i.e. filtering SQL injection pattern from http payload.
We want to single-pane-of-glass to trace traffic that are filtered by the policies.
We want to filter out and stop the packets that violates the policies.

Kubernetes egress traffic lifecycle

Kube-proxy manages the routing rules of the node by talking to API server. Whenever a service is added or removed, it updates the routing rules accordingly to make sure that the packets get routed correctly. Currently, it supports three modes: User space, iptables, and IPVS. Iptables mode is the most used mode as default for most kubernetes implementations.

Packets from node to the internet gets SNATed to the host node's IP address, then sent to the internet gateway. Otherwise, it will be rejected by the gateway.

Thought process

If we can capture the packets before they get SNATed, preferably without affecting performance, and ask the API server which service it is, if there is related policy, then we will be able to track and filter the traffic. We can use eBPF to capture the packets.

Some notes on eBPF

We have conducted simple experiment with eBPF to trace TLS traffic by uProbing openssl shared library symbols. We tried ebpf and gobpf. Cilium ebpf does not support uProbing shared library. We used gobpf to uProbe symbols of libssl.so. The idea was to capture data before it gets encrypted(server-to-client) and after it gets decrypted(client-to-server).

The experiment was successful and through this experiment, we are assured that eBPF can be used to probe kernel and user space functions and even shared libraries.

Design

Based on research, I think a good direction is to probing kube-proxy + iptables to capture packets when it gets SNATed.

Diagram as follows:

egress filter design

marccampbell commented 2 years ago

@alexc20 this is great, thanks for the detailed write up.

Are you proposing that iptables rules will be the output of the rules? How will the TLS and uprobe work to detect schema or validation of data that's encrypted so that we can create rules here?

alexc20 commented 2 years ago

@alexc20 this is great, thanks for the detailed write up.

Are you proposing that iptables rules will be the output of the rules? How will the TLS and uprobe work to detect schema or validation of data that's encrypted so that we can create rules here?

This design is just scoped for egress traffic monitoring and filtering is not included. I think we cannot filter the packets by changing iptables rules because it will affect other valid packets as well. I am not sure though. For filtering, we may need some more in-depth research.

I think the rule should look like this:

<protocol> <src service> <dst dns or ip address> [payload pattern]

The design does not detail the protocol parser yet. For TLS uprobe, I think they get encrypted at the pod then we may need to change our probing point to pod. What that means is that we may need to deploy our agent to each pod.

alexc20 commented 2 years ago

Updated design as follows: egress filter design updated

Agents are deployed on each node. It uses eBPF to collect network data.
Aggregators collect data from agents, queries metadata to check which service it originated from, check for matching rules and save it to db if there is match.

Notes:

It relies on agents that are installed on each node of the cluster. We will NOT need to install agent or sidecar container to the pods. It does NOT depend on third party CNI plugins like Calico.
Major change to the original design is that it relates metadata asynchronously at the central collector. The implication of this is that we will be able to do detection but not inline prevention. Alternative to enable inline prevention is to host automatically synced metadata service per host.

Reference: https://blog.px.dev/pixie-intro/#episode-2:-the-detail https://docs.px.dev/about-pixie/what-is-pixie/ https://github.com/pixie-io/pixie/tree/main/src/stirling

replicatedhq / exfilter