loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it offers better multi-tenancy and isolation than regular namespaces.
https://www.vcluster.com
Apache License 2.0
6.38k stars 410 forks source link

fluent bit running inside a vcluster can collect logs from pods deployed in the host and other vclusters. #1989

Closed vicaya closed 2 months ago

vicaya commented 3 months ago

What happened?

I was trying out the new relic k8s bundle inside a virtual cluster following the observability documentation. To my surprise the collected logs contains logs from pods in the host and other vclusters. It collected name/namespace/status of all the pods in the host cluster and other vclusters as well.

What did you expect to happen?

Any monitoring tools deployed inside a vcluster should only see metrics and logs generated by the pods in the vcluster only.

How can we reproduce it (as minimally and precisely as possible)?

  1. Create a vcluster with values from the docs for observability: syncAllNodes etc (see the included vcluster config values).
  2. Deploy the vcluster-hpm helm chart in the vcluster namespace (vcluster-test0 with --set VclusterReleaseName=test0)
  3. Deploy of a new relic helm chart (really easy to follow).

Anything else we need to know?

If this is a known bug, what's the timeline to fix this rather annoying security issue?

Host cluster Kubernetes version

```console $ kubectl version # paste output here Client Version: v1.30.3 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.5+k3s1 ```

vcluster version

```console $ vcluster --version # paste output here vcluster version 0.19.6 ```

VCluster Config

``` # My vcluster.yaml / values.yaml here (from helm get value ...) proxy: metricsServer: nodes: enabled: true pods: enabled: true serviceCIDR: 10.43.0.0/16 sync: nodes: enabled: true syncAllNodes: true syncer: extraArgs: - --tls-san=test0.internal - --mount-physical-host-paths=true vcluster: image: rancher/k3s:v1.29.0-k3s1 ```
facchettos commented 2 months ago

I have been able to reproduce it, I will let you know when we have more info

vicaya commented 2 months ago

Thanks for following up @facchettos! Here is my quick findings with a few greps for posterity:

I'm sure you already know why it happens: the rewriteHostPath function only rewrites a few known paths without a catch all mechanism.

I suggest that we introduce --mount-logging-host-paths, --mount-kubelet-host-paths syncer options, and leave --mount-physical-host-paths for other corner cases, as they all have different security implications.

BTW, the symlink approach in the host mapper is also problematic, as you'd need to mount the original paths (albeit at a different location in the pod) anyway, why not just use bind mount (go syscall.Mount supports it just fine) for the pod dirs?

facchettos commented 2 months ago

Hi again! So, here the problem is that newrelic mounts the entire /var directory which prevents us from doing efficient detection. To solve your problem, you should look into the helm chart to modify the settings accordingly, (specifically linuxMountPath and path values for logging, and do something similar with the other charts).

If you are looking for something more advanced with more options, you should check for the pro tier of vcluster, which adds extra features and enhanced security.

vicaya commented 2 months ago

newrelic mounts the entire /var directory

Understood. IMO, this an perfect opportunity for syncer to do a little more for "strong" isolation as claimed in the marketing material. Since most hostpath access is for logging. A better solution is to always rewrite hostpaths so /var becomes /vcluster-prefix/var, which would contain empty log/{pods,containers} dirs initially. The hostpath-mapper can then bind mount appropriate pod dirs and symlink the container logs. Currently --mount-physical-host-paths is a too big of a hammer just for logging. Does pro tier already do this (the doc certainly doesn't say it)?

facchettos commented 2 months ago

The pro version of that functionality uses bind mounts and no symlinks yes. It is also easier to use since it is centralized and you don't need to deploy it for each virtual cluster

A better solution is to always rewrite hostpaths

It would require the virtual cluster to have extended rbac for the cases where it is not needed though.

vicaya commented 2 months ago

It would require the virtual cluster to have extended rbac for the cases where it is not needed though.

In case it's not clear, by "always rewrite hostpaths", I meant that it only applies when relevant options like --rewrite-host-paths is specified as syncer option. In the host, equivalent of allowedHostPaths (psp and kubewarden policy equiv) pathPrefix would suffice to enforce the constraints even better.

Either no hostpath (pss baseline) or always rewrite is needed to enforce reasonable isolation for both security and deployment conflicts. The current default implementation provides no general hostpath isolation, which means many dev/test workloads that use hostpaths (e.g. developing distributed storage solution) can not use vcluster easily (admin needs to config hostpath per app per vcluster), which would otherwise be a great option.

IMO, open source a reasonable solution would encourage better adoption and more community participation. The value of SaaS shouldn't depend on such a basic isolation requirement.