Audit events have node IP address as source address for requests

mreiger commented 3 years ago

Audit events from the kube-apiserver contain a field for the source IP that the requests came from. Example:

audittailer-768f964b78-t4hcs audittailer {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Request","auditID":"39d36d5d-cae5-4b0c-8ef2-8dc8013f49d1","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/default/pods?limit=500","verb":"list","user":{"username":"oidc:IZ00242","uid":"IZ00242","groups":["oidc:all-cadm","system:authenticated"]},"sourceIPs":["10.67.48.2"],"userAgent":"kubectl/v1.21.1 (linux/amd64) kubernetes/5e58841","objectRef":{"resource":"pods","namespace":"default","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2021-05-27T17:30:52.228925Z","stageTimestamp":"2021-05-27T17:30:52.231553Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"oidc-all-cadm\" of ClusterRole \"cluster-admin\" to Group \"oidc:all-cadm\""}}

Unfortunately the "sourceIPs":["10.67.48.2"] is the node IP address of one of the nodes in the seed cluster. This seems to be the correct behaviour since the Apiserver is is exposed as service of typ loadBalancer with externalTrafficPolicy: Cluster.

From an audit point of view this is not ideal because it hides the real source address from which an event originated. Changing the externalTrafficPolicy of the kube-apiserver service manually to Local fixes this temporarily, until the service get reconciled again. Example audit event:

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Request","auditID":"952889bc-8879-43f6-9d91-e465cae3c76e","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/audit/pods/audittailer-768f964b78-zg8jk/log","verb":"get","user":{"username":"oidc:IZ00242","uid":"IZ00242","groups":["oidc:all-cadm","system:authenticated"]},"sourceIPs":["95.117.118.243"],"userAgent":"kubectl/v1.21.1 (linux/amd64) kubernetes/5e58841","objectRef":{"resource":"pods","namespace":"audit","name":"audittailer-768f964b78-zg8jk","apiVersion":"v1","subresource":"log"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2021-05-27T17:45:49.837864Z","stageTimestamp":"2021-05-27T17:45:51.099244Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"oidc-all-cadm\" of ClusterRole \"cluster-admin\" to Group \"oidc:all-cadm\""}}

This seemed to have no ill effect on the cluster during the short time until the policy was reset, so I suggest we set the externalTrafficPolicy of the kube-apiserver to Local in ths extension provider.

mreiger commented 2 years ago

The apiserver SNI feature with its istio gateway moves the problem from the apiserver service (which is now internal) to the istio-ingressgateway server, with the additional "bonus" that we can not change this service from this extension provider.

We still need the original client address for the audit log eventually.

Gardener's documentation says it is possible to use a self-deployed istio instead of using the Gardener managed istio, but do we want to do this?

Gerrit91 commented 2 years ago

Did we ask some people from the Gardener community in the past on how they deal with this?

As the problem originates from the kubeproxy, I assume that running Calico or another CNI like Cilium in "kubeproxy-free" mode resolves the issue as a whole. It's probably hard to setup and change will be disruptive, but it generally fits to our environment and also from the network / performance perspective it seems to be superior:

mwennrich commented 2 years ago

This has nothing to do with calico or CNI. From https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/

.spec.externalTrafficPolicy - denotes if this Service desires to route external traffic to node-local or cluster-wide endpoints. There are two available options: Cluster (default) and Local. Cluster obscures the client source IP and may cause a second hop to another node, but should have good overall load-spreading. Local preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type Services, but risks potentially imbalanced traffic spreading.

Gerrit91 commented 2 years ago

According to the Cilium documentation it sounds to me like externalTrafficPolicy=Cluster does not obscure source IPs when running in DSR mode. But maybe I did not understand it correctly then. :thinking:

externalTrafficPolicy=Cluster: For the Cluster policy which is the default upon service creation, multiple options exist for achieving client source IP preservation for external traffic, that is, operating the kube-proxy replacement in DSR or Hybrid mode if only TCP-based services are exposed to the outside world for the latter.

(source already mentioned above)

As far as I know, the local policy comes with minor nits that are probably tolerable in our case, but still not 100% cool. This is unequal traffic distribution + short traffic blackholing on service update / roll (see this KEP?).

I am gonna need to read a bit more or experiment with it first...

metal-stack / gardener-extension-provider-metal

Audit events have node IP address as source address for requests #186