k0sproject / k0s

k0s - The Zero Friction Kubernetes
https://docs.k0sproject.io
Other
3.84k stars 369 forks source link

plans for podSecurityPolicy given deprecated status #1420

Open cortopy opened 2 years ago

cortopy commented 2 years ago

Is your feature request related to a problem? Please describe.

Pod security polices are deprecated since 1.21. However, k0s still allows configuring a default PodSecurityPolicy and the security model seems to depend on it

Describe the solution you would like

I've been searching on the repo for any discussion about this and couldn't find anything, so this issue is just to know what will happen next.

PodSecurity is confirmed to be the native successor and is already in beta and enabled by default in the latest k0s

There are other options too like Open Policy Agent (OPA). They may be more sophisticated but would bring in a new dependency to manage

Describe alternatives you've considered

There is no other alternative, since PSP are already deprecated

Additional context

No response

makhov commented 2 years ago

Technically, Pod Security Standards could be used with the current k0s version with some manual actions. Some automation could be done for this, but it requires proper feature design.

How to enable PodSecurity with k0s

PodSecurity can be statically configured in the Admission Controller configuration.

For example:

  1. Create admission control config file:
    apiVersion: apiserver.config.k8s.io/v1
    kind: AdmissionConfiguration
    plugins:
    - name: PodSecurity
    configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1beta1
    kind: PodSecurityConfiguration
    # Defaults applied when a mode label is not set.
    defaults:
      enforce: "privileged"
      enforce-version: "latest"
    exemptions:
      # Don't forget to exempt namespaces or users that are responsible for deploying
      # cluster components, because they need to run privileged containers
      usernames: ["admin"] 
      namespaces: ["kube-system"] 
  2. Add kube-apiserver arguments to the k0s configuration
apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
spec:
  api:
    extraArgs:
      disable-admission-plugins: PodSecurityPolicy # if you want to disable PodSecurityPolicy admission controller, not required
      enable-admission-plugins: PodSecurity        # only for Kubernetes 1.22, since 1.23 it's enabled by default
      feature-gates: "PodSecurity=true"                # only for Kubernetes 1.22, since 1.23 it's enabled by default
      admission-control-config-file: /path/to/admission/control/config.yaml
  1. Install k0s with the PodSecurityPolicy component disabled.
    $ k0s install controller --disable-components="default-psp"

Pod Security Standards can be enforced using namespace labels. Actually, you can just label all namespaces and don't use the admission control config file.

makhov commented 2 years ago

Just a random idea from the top of my mind: We can add two new sections to the k0s config:

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
spec:
  api:
    podSecurity:
        enforce: "privileged"
        audit: "privileged"
        warn: "privileged"
    admissionConfig: |
        apiVersion: apiserver.config.k8s.io/v1
        kind: AdmissionConfiguration
        plugins:
        - name: PodSecurity
          configuration:
            apiVersion: pod-security.admission.config.k8s.io/v1beta1
            kind: PodSecurityConfiguration
            defaults:
              enforce: "privileged"
              enforce-version: "latest"
            exemptions:
              usernames: ["admin"] 
              namespaces: ["kube-system"] 
r3h0 commented 2 years ago

This comment was helpful, but I'm not able to get it to work. I'm curious if you tried it, @makhov? I gave up on PodSecurityPolicy after I couldn't get the defaultPolicy feature to work -- pods run as 00-k0s-privileged even when I set it to 99-k0s-restricted. However, having switched to the admission controller, I'm running into the problem that changes I make to disable-admission-plugins and enable-admission-plugins seem to be ignored, and even a hard sudo k0s stop && sudo k0s start doesn't pick up the new configuration. I consistently get a SuccessfulReconcile event, but nothing else seems to change, and k0s logs show --disable-admission-plugins=\"[]\".

Maybe I'm using dynamic configuration wrong? I don't have an easy way to manage the static configuration on my host (plus I've been confused about the relationship between the k0s config and the manifests/ directory), so I thought I'd make my life easier by having the single source of truth for k0s config live in the cluster.

Days of banging my head against the wall has me more and more confused about how k0s config works.

jnummelin commented 2 years ago

However, having switched to the admission controller, I'm running into the problem that changes I make to disable-admission-plugins and enable-admission-plugins seem to be ignored

Where/how are you doing those config changes? In the dynamic config, the ClusterConfig CRD object?

These flags affect the API server and those bits can (currently at least) be only configured via the node-local yaml config. That is basically as in some cases the set of e.g. extraArgs can change between the controller nodes. See also https://docs.k0sproject.io/v1.23.3+k0s.1/dynamic-configuration/#cluster-configuration-vs-controller-node-configuration

makhov commented 2 years ago

Sorry, originally I forgot to mention adding feature-gates: "PodSecurity=true" to the extraArgs for Kubernetes 1.22 in the comment above.

makhov commented 2 years ago

@r3h0 what version of k0s do you use? There was a bug in the default PodSecurityPolicy behavior. that was fixed a couple of versions ago. What exactly do you mean by "pods run as 00-k0s-privileged even when I set it to 99-k0s-restricted"? We have a test for default-psp component, so I'd like to make sure that it's correct or fix it if it's not.

r3h0 commented 2 years ago

Q1: Dynamic Config

Where/how are you doing those config changes? In the dynamic config, the ClusterConfig CRD object? These flags affect the API server and those bits can (currently at least) be only configured via the node-local yaml config.

Bummer. Like I said, I was trying to use the dynamic config exclusively because I don't have a good way to manage (e.g., version control) files on the hosts. However, it sounds like that's not an option, so I'll have to figure something out.

The crux of the config problem: if I manually edit the manifests that k0s creates in the data dir, how do I later merge in changes to those manifests from later k0s versions? Any recommendations on how to do that?

Q2: k0s Version

what version of k0s do you use?

Previously, I was using v1.23.3+k0s.0, which was the version installed by k0sctl, which I'm not bothering with anymore due to some bugs/issues with it. I see there was a k0s update 2 days ago, so today I completely removed the old version (0 results for sudo find / -name "*k0s*"), then installed v1.23.3+k0s.1, running only as root and keeping everything in a /root/.k0s directory:

sudo bash
mkdir /root/.k0s
cd /root/.k0s
mkdir bin
mkdir data
chmod -R og-rwx /root/.k0s
wget -O bin/k0s -q https://github.com/k0sproject/k0s/releases/download/v1.23.3%2Bk0s.1/k0s-v1.23.3+k0s.1-amd64
chmod a+x bin/k0s

With the new version installed, here's how I launched the controller:

/root/.k0s/bin/k0s --data-dir /root/.k0s/data/ controller --enable-worker --no-taints --config /root/.k0s/k0s.yaml

Installation Hiccup

Side note: I biffed that command the first time, and CTRL+c failed to stop the controller:

ERRO[2022-02-20 20:11:30] error while stopping node component failed to stop components

It left behind a few containers that could not be stopped with k0s ctr and /root/.k0s/data could not be deleted.

Once I restarted the server, I could delete the data dir. Re-running the correct controller command, here's the status (from another terminal window):

/root/.k0s/bin/k0s status:

Version: v1.23.3+k0s.1 Process ID: 1310 Role: controller Workloads: true SingleNode: false

Q3: 99-k0s-restricted

After all that, I can confirm one of the issues I mentioned in a previous post. Your question:

What exactly do you mean by "pods run as 00-k0s-privileged even when I set it to 99-k0s-restricted"?

For the k0s config, I ran /root/.k0s/bin/k0s config create > /root/.k0s/k0s.yaml, editing only this bit:

 podSecurityPolicy:
    defaultPolicy: 99-k0s-restricted # 00-k0s-privileged

To test the default policy, I ran a test pod, /root/.k0s/bin/k0s --data-dir /root/.k0s/data/ kubectl apply -f hello-world.yaml, using this manifest:

---
apiVersion: v1
kind: Namespace
metadata:
  name: hello-world
---
apiVersion: v1
kind: Pod
metadata:
  namespace: hello-world
  name: hello-world
spec:
  automountServiceAccountToken: false
  containers:
  - name: hello-world
    image: docker.io/hello-world:latest
  restartPolicy: Never

The namespace was successfully created and the pod executed. Here's how I looked for the pod security policy (is there a better way?): /root/.k0s/bin/k0s --data-dir /root/.k0s/data/ kubectl -n hello-world describe pod hello-world

In the output below, note "Annotations: kubernetes.io/psp: 00-k0s-privileged"

Name: hello-world Namespace: hello-world Priority: 0 Node: ryzen/192.168.1.125 Start Time: Sun, 20 Feb 2022 20:33:59 +0000 Labels: Annotations: kubernetes.io/psp: 00-k0s-privileged Status: Succeeded IP: 10.244.0.20 IPs: IP: 10.244.0.20 Containers: hello-world: Container ID: containerd://41ce024bd84bae5dfc261e102c209676423258e6255103f6da3bce4f782b7d9f Image: docker.io/hello-world:latest Image ID: docker.io/library/hello-world@sha256:97a379f4f88575512824f3b352bc03cd75e239179eea0fecc38e597b2209f49a Port: Host Port: State: Terminated Reason: Completed Exit Code: 0 Started: Sun, 20 Feb 2022 20:34:00 +0000 Finished: Sun, 20 Feb 2022 20:34:00 +0000 Ready: False Restart Count: 0 Environment: Mounts: Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 12s default-scheduler Successfully assigned hello-world/hello-world to ryzen Normal Pulling 12s kubelet Pulling image "docker.io/hello-world:latest" Normal Pulled 11s kubelet Successfully pulled image "docker.io/hello-world:latest" in 1.280117634s Normal Created 11s kubelet Created container hello-world Normal Started 11s kubelet Started container hello-world

Side Note

Using a non-default --data-dir is a pain. Commands like /root/.k0s/bin/k0s kubectl... fail:

error: open /var/lib/k0s/pki/admin.conf: no such file or directory

Thus, I had to append --data-dir /root/.k0s/data/ to every command below.

What's Next

I don't have time today to revisit the PodSecurityConfiguration used by the new AdmissionController. First I'll have to find a way to manage the k0s configuration on the host statically. Prior to all this, I had been using an older k0s version for months, and updating it has proven to be a multi-day ordeal, and I still don't have it working. (Granted I'm trying to make some improvements along the way -- e.g., I thought k0sctl would help me -- so it's partly my fault for biting off too much.)

jnummelin commented 2 years ago

The crux of the config problem: if I manually edit the manifests that k0s creates in the data dir, how do I later merge in changes to those manifests from later k0s versions? Any recommendations on how to do that?

There's no clean way to do that unfortunately. k0s just plain overwrites the manifests it manages. The only way to use custom manifests is to disable the given component and manage the manifests/deployment of that "outside" of k0s. Of course if there's some common enough bits you'd want to configure for some component we can definitely add some configurable bits. If you have specific detail youäd like to see configurable for some component, maybe open up a separate enhancement issue and we can look into.

For the k0s config, I ran /root/.k0s/bin/k0s config create > /root/.k0s/k0s.yaml, editing only this bit:

I think this is a misunderstanding about the "scope" for dynamic config. The docs say:

The following list outlines which options are controller node specific and have to be configured only via the local file:

spec.api - these options configure how the local Kubernetes API server is setup
spec.storage - these options configure how the local storage (etcd or sqlite) is setup

As the PSP is NOT on this list, it must be managed via the CRD. I agree, we must clarify some of the things in the docs to make it more obvious which parts are managed via the CRD and which ones via the local config file. We'd really appreciated input on this from user perspective. We (the core maintainers) are probably looking things from bit too close thus writing the docs is always bit hard to get them to correct level of detail.

Our smoke test runs k0s with config:

spec:
  podSecurityPolicy:
    defaultPolicy: 99-k0s-restricted

and that results having pods annotations as:

bash-5.1# k0s kc describe pod test-pod-non-privileged
Name:         test-pod-non-privileged
Namespace:    default
Priority:     0
Labels:       <none>
Annotations:  kubernetes.io/psp: 99-k0s-restricted

Thus, I had to append --data-dir /root/.k0s/data/ to every command below. One option is to run something like:

export KUBECONFIG= /root/.k0s/data/pki/admin.conf

After that k0s kubectl ... will respect that and you do not have to feed in --data-dir for every command.

makhov commented 2 years ago

k0s kubectl command uses admin credentials and for the admin user the policy is 00-k0s-privileged. If you create "simple" users, the restricted policy will be assigned for them.

danmx commented 2 years ago

I'm reading the comments and I'm a bit confused what you want to achieve. 00-k0s-privileged and 99-k0s-restricted Pod Security Standards should be ok.

I'd avoid replicating more granular admission controller and either leave it to Cluster Operators to think about or provide e.g. Kyverno (leaning more to this one) and/or OPA Gatekeeper (both CNCF projects) as an extra package like Calico or kube-router for CNIs.

r3h0 commented 2 years ago

Since the original issue was posted, there have been some documentation updates regarding PSP's: https://docs.k0sproject.io/v1.24.4+k0s.0/podsecurity/. Given that upstream Kubernetes 1.25 has been released, I speculate that the k0s team will address this issue before making 1.25 the latest stable version of k0s.

I'm reading the comments and I'm a bit confused what you want to achieve. 00-k0s-privileged and 99-k0s-restricted Pod Security Standards should be ok.

@danmx I'm not sure to whom you were addressing. Apologies if I derailed the original conversation with my questions; reading them again now, I seem to have gotten off-topic.

r3h0 commented 2 years ago

@twz123 @jnummelin, I see that main has removed PSP already. Apologies if I'm getting ahead of myself if you still have further plans to change/improve that behavior before release, but it sounds like this issue can be updated either way. From the PR https://github.com/k0sproject/k0s/pull/2112:

Remove PodSecurityPolicies from k0s They were deprecated for a long time, and have been removed in Kubernetes 1.25. Don't return an error on dropped podSecurityPolicy field in the cluster config.

It's true that PSP's have been deprecated for a long time, but has the migration path been clear to k0s users? I confess that I've not migrated because I don't know how. I have been following this issue, waiting for a replacement to the podSecurityPolicy setting in my k0s config or clarification about how to achieve something similar. I generally understand how the new system works from the kubernetes perspective but not the k0s-specific configuration aspects. That said, when 1.25 is released, I'll figure it out one way or another.

My main concern is for the k0s users who are NOT following this issue closely and who might not realize that their PSP is no longer being applied.

Relevant discussion from the PR:

What happens now if user has this configured and they start update to 1.25.0? Will they have to remove this from config yaml or is it just ignored? ... I'd say that it should be ignored and a warning should be printed to the logs ... SGTM

IIUC, k0s currently ignores the podSecurityPolicy field by calling YamlUnmarshalStrictIgnoringFields, which swallows the error:

// YamlUnmarshalStrictIgnoringFields does UnmarshalStrict but ignores type errors for given fields
func YamlUnmarshalStrictIgnoringFields(in []byte, out interface{}, ignore ...string) (err error) {
    err = yaml.UnmarshalStrict(in, &out)
    if err != nil {
        // parse errors for unknown field errors
        for _, field := range ignore {
            unknownFieldErr := fmt.Sprintf("unknown field \"%s\"", field)
            if strings.Contains(err.Error(), unknownFieldErr) {
                // reset err on unknown masked fields
                err = nil
            }
        }
        // we have some other error
        return err
    }
    return nil
}

Does sigs.k8s.io/yaml's UnmarshalStrict print a warning about the field? If not, my read of the code is that no warning will be produced.

Some users might not see the warning anyway, but it's better than nothing. Personally, I prefer a hard failure on any invalid configuration, so I can decide how to handle each change -- especially when it comes to anything related to security. However, I can appreciate that not everyone has the same preference.

jnummelin commented 2 months ago

We need to get rid of all the PSP remnants and check that we utilize pod security standards properly