operator-framework / operator-sdk

SDK for building Kubernetes applications. Provides high level APIs, useful abstractions, and project scaffolding.
https://sdk.operatorframework.io
Apache License 2.0
7.24k stars 1.74k forks source link

docs: Quickstart for Go-based Operators, kind=Memcached fails #4364

Closed bentito closed 3 years ago

bentito commented 3 years ago

https://sdk.operatorframework.io/docs/building-operators/golang/quickstart/

The underlying operatorsdk --kind=Memcached called out in the Quickstart docs, Memcached, fails to deploy with an error like this:

│   Warning  FailedCreate  18s (x13 over 39s)  replicaset-controller  Error creating: pods "memcached-operator-controller-manager-74cd5fb996-" is forbidden: unable to validate │
│  against any security context constraint: [spec.containers[0].securityContext.runAsUser: Invalid value: 65532: must be in the ranges: [1000600000, 1000609999] spec.cont │
│ ainers[1].securityContext.runAsUser: Invalid value: 65532: must be in the ranges: [1000600000, 1000609999]]

This is likely due to changes in the gcr.io/distroless/static:nonroot image. Was able to fix the instantiated file config/manager/manager.yaml by removing runAsUser and adding runAsNonRoot: true to the spec.securityContext.

I'm not sure this is really a docs issue so much as at this point I'm researching where operator-sdk goes for "kinds" such that I might file an issue on the Memcached kind there instead. But it is true that the Quickstart doc currently doesn't produce a working example due to the problem.

camilamacedo86 commented 3 years ago

Hi @bentito,

The error faced "Invalid value: 65532: must be in the ranges: [1000600000, 1000609999]]" is specific to the OCP environment.

Then, by applying your solution "by removing runAsUser" you are raising security concerns.

Could you please try to change the user id in the Dockerfile and manager manifests for a value in the range [1000600000, 1000609999]] and let us know?

If it works well as expected, we might need to push this change to upstream https://github.com/kubernetes-sigs/kubebuilder/search?q=65532.

c/c @estroz @jmrodri

bentito commented 3 years ago

I tried changing the UID, and I agree on security concern of my proposed workaround, but the UID range changes each time, it's a moving target, you can't guess it correctly every time. I think the problem must lie in the gcr.io/distroless/static:nonroot image assuming this memcached example used to work.

camilamacedo86 commented 3 years ago

Hi @bentito,

Sorry. I missed that you are using runAsNonRoot: true which shows enough to solve the security concern over it. So, your workaround shows OK.

The reason for the specific used ID be set shows to be because of the K8s security police issue "container has runAsNonRoot and the image has non-numeric user (nonroot), cannot verify user is non-root" raised by the k8s code implementation. More info: "https://stackoverflow.com/questions/49720308/kubernetes-podsecuritypolicy-set-to-runasnonroot-container-has-runasnonroot-and

It was introduced in upstream in the PR: https://github.com/kubernetes-sigs/kubebuilder/pull/1635. However, as @joelanford spoke with me shows that "specifying a specific UID doesn't work in OCP (and possibly other non-upstream k8s distros)".

By checking the K8S code implementation is the master branch the check still in place. Because of this, I do not think that we should change the upstream/kubebuilder implementation. However, I think that we can:

WDYT @jmrodri @estroz ?

bentito commented 3 years ago

PR adding specific workaround to the FAQ is: https://github.com/operator-framework/operator-sdk/pull/4368

estroz commented 3 years ago

Before adding this to the FAQ I'd like to see this addressed upstream, i.e. is there another workaround that will work for both OCP and vanilla k8s.

/triage needs-information

jmrodri commented 3 years ago

Before adding this to the FAQ I'd like to see this addressed upstream, i.e. is there another workaround that will work for both OCP and vanilla k8s.

My understanding is if you specify a specific user on OpenShift it will attempt to honor that setting. So a runAsUser set to 65532 will cause OpenShift to try and use user id 65532 which does not exist. So we have to NOT set it so that OpenShift will automatically use a random user id.

My understanding was for upstream we needed to add runAsUser to avoid running as root upstream. The best solution would be an OpenShift plugin that would fix this when targeting OpenShift.

estroz commented 3 years ago

Yeah that sounds correct. My comment is referring to runAsUser: 65532 being redundant, since the default value is taken from container metadata, which sets this value to 65532 already upstream. Therefore we should be able to replace runAsUser: 65532 with runAsNonRoot: true upstream and not have to add this workaround.

estroz commented 3 years ago

PR: https://github.com/kubernetes-sigs/kubebuilder/pull/1978

estroz commented 3 years ago

The migration guide for #4402 should include a section for changing this value, since it is fixed upstream.