stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Make the products security context configurable #628

Open razvan opened 3 months ago

razvan commented 3 months ago

Description

Stackable operators use hard-coded pod security contexts as noted in this issue.

These are rather arbitrary and mostly useless in regular OpenShift deployments.

They can be overridden using pod overrides but the use of pod overrides is usually discouraged.

Value

Strengthen security while providing a simple(er) configuration interface for sec contexts.

Tasks

adwk67 commented 3 months ago

Will this issue also address the hard-coded UID, as mentioned in https://github.com/stackabletech/issues/issues/607? pub const NIFI_UID: i64 = 1000;

PaulienVa commented 2 months ago

@razvan will the fix for this issue be included in release 24.11?

razvan commented 2 months ago

@razvan will the fix for this issue be included in release 24.11?

@PaulienVa I'm a bit skeptical since it has not been prioritized yet. But the roadmap is not fully closed yet so there might be a chance. If it does make it, we'll update the issue (labels).

PaulienVa commented 2 months ago

We are really really really blocked by it, so if by any chance that will weight in the prio's I'll be happy if it could be included in the release :)

sbernauer commented 2 months ago

@PaulienVa just to be on the same page: You are aware that you can already set a custom securityContext using podOverrides? My understanding is that this issue is about (optionally) making it a bit nicer to configure by placing the field a bit higher in the hierarchy.

One working example is

apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
  name: simple-zk
spec:
  image:
    productVersion: 3.9.2
  servers:
    podOverrides:
      spec:
        securityContext:
          runAsUser: 1234
          runAsGroup: 5678
          fsGroup: 9876
    roleGroups:
      default:
        replicas: 1
PaulienVa commented 1 month ago

@sbernauer we have worked on implementing this workaround. But plainly using this overrides resulted in the following error:

$ oc logs pyspark-pis-0d9503922eb6c50b-driver
/stackable/spark/kubernetes/dockerfiles/spark/entrypoint.sh: line 44: java_opts.txt: Permission denied

So we have built a custom image in the following way:

FROM docker.stackable.tech/stackable/spark-k8s:3.5.1-stackable24.7.0 
USER root
WORKDIR /stackable
RUN chgrp -R 0 /stackable && chmod -R g+rwX /stackable
USER stackable
WORKDIR /stackable

The stackable user then gets the correct permissions to run.

Is this something you can implement by default in the stackable images?

sbernauer commented 1 month ago

Hi @PaulienVa, glad to hear the podOverrides worked! @lfrancke is doing something related to groups and permissions in https://github.com/stackabletech/docker-images/pull/849, maybe he can give some details if that PR would solve your problem

lfrancke commented 1 month ago

Thanks for pinging me.

Yes, we have a slow running initiative to entirely revamp the whole user handling in our docker images. The issue that Sebastian linked to contains a whole lot of details. For your specific case we are going to change the group of all files to "0" (the root group) just like you do in your custom image.

In other words: In the next release (24.11 - due in November) you should not need your custom image anymore.

We will not achieve full securityContext configurability yet and the user id will still be hardcoded to 1000 for the next release. I currently hope to lift that restriction in one of the next three releases. It's a longer process.

@PaulienVa Can you tell us what exactly it is you need to configure? That'd help me find more corner cases and issues we might not have thought about so far.

PaulienVa commented 1 month ago

@lfrancke for us it is important that the hardcoded user id will not be hardcoded anymore. When configuring nothing and using the default setup we see that the pods are running in the non-root SCC, however we want the pods to run in the restricted-v2 SCC. We therefore use the podOverrides as @sbernauer suggested here. We used a user that is dedicated to the project that we retrieve like this:

oc get project <PROJECT> -o jsonpath='{.metadata.annotations.openshift\.io/sa\.scc\.uid-range}'

if we could get rid of this podOverrides section it would be way cleaner.

lfrancke commented 1 month ago

Thanks for the clarification.

Understood. So, in 24.7 we moved away from our custom SCC to nonroot-v2 but as you said restricted-v2 doesn't work just yet. I am afraid that this will still be the case in 24.11 but we should have all the machinery in place to make it possible to then do this for 25.3 or 25.7. I know that's still some way off but we need to test this carefully and it's too close to 24.11 for us to slip this in now.

But it is a project we're working on and as I said above: The first benefit is going to be that you should be able to get rid of your custom image in 24.11.

koroglusaban commented 1 month ago

Thanks for the update! @lfrancke Just to clarify, does this change also apply to the Airflow Operator? Currently, the Airflow Operator uses airflow-scc. Will the transition to nonroot-v2 (and eventually restricted-v2) affect or replace the airflow-scc as well?

lfrancke commented 1 month ago

As of 24.7 the Airflow Operator should also be using nonroot-v2 unless I'm mistaken. Do you see something different? There is one caveat: The OLM package for 24.7 will be using nonroot-v2 but the Helm chart was only fixed after the release, that slipped through the cracks. So, if you're using Helm you'll still have the custom one but should be able to change it and as of 24.11 this will also move to nonroot-v2.

https://github.com/stackabletech/issues/issues/624

So, yes. Airflow Operator will also be moving to restricted-v2.

koroglusaban commented 1 month ago

Thanks for the clarification! One more question: Would it be possible for us to use a night build until the new version is officially released?

lfrancke commented 1 month ago

Yes, that's possible. But so far, this change has not been merged! So, at the moment this will not help you.

You can follow this issue: https://github.com/stackabletech/issues/issues/645 or this pull request https://github.com/stackabletech/docker-images/pull/849

koroglusaban commented 1 month ago

Hi @lfrancke

I noticed that the Airflow development images are receiving the correct ID, for which I thank you. However, it seems that the Spark development images are not being applied correctly. Could you please confirm if I am understanding this correctly?

Thank you!

lfrancke commented 1 month ago

Hi, yes, that's correct. The first PR landed. I decided to split it in multiple parts because it caused merge conflicts all the time. I'll try to put up the second PR for the remaining images this week.

koroglusaban commented 1 month ago

Hi @lfrancke,

Thanks for the update! Could you please let me know once the second PR is created, or point me to where I can follow its progress?

I appreciate your help!

lfrancke commented 1 month ago

I just put up the draft PR so you can follow it already: https://github.com/stackabletech/docker-images/pull/890 At the moment it only includes Hive & Kafka but I'm working my way through the list.

koroglusaban commented 1 month ago

Hi @lfrancke,

Great, thanks for sharing the draft PR! I'll follow it and I’ll keep an eye on the updates as you work through the rest of the list.

Appreciate the effort!

lfrancke commented 1 month ago

FYI the PR with the changes for the Spark image has been merged. The dev image should now include that change. Let me know if you find any issues.

The remaining products will be done in a last PR: https://github.com/stackabletech/docker-images/pull/897

koroglusaban commented 1 month ago

Thank you! I’ve tested it, and it works with the following workaround: podOverrides: spec: securityContext: runAsUser: <openshiftprojectuid-range> fsGroup: <openshiftprojectuid-range> When do you think this could be incorporated? I really appreciate your hard work!

lfrancke commented 1 month ago

This will not be in our next release I'm afraid. We're slowly working our way through the overarching issue https://github.com/stackabletech/issues/issues/645 but we're close to the next release (24.11) and don't want to introduce any major changes this close to the release. It'd mean that we don't have much time to test these changes.

I'm afraid you'll have to live with the overrides for now. I hope we can get more changes in our release after the next which would be in March 2025 (current plan).

koroglusaban commented 1 month ago

Thank you for the update! I completely understand the need to avoid major changes this close to the release. I’ll continue using the overrides in the meantime. I’m looking forward to the March 2025 release and hope we can address more of these issues then.

Also, could you confirm if the following change will be included in the next update? nonroot-v2 to restricted-v2 in clusterrole via OLM and helm `

lfrancke commented 1 month ago

The next release (24.11) will use nonroot-v2 everywhere. We want to move to restricted-v2 in a future release. restricted-v2 will already work with some operators, but some have hardcoded usernames (we do not have a list yet) and we need to fix that.

koroglusaban commented 1 month ago

Thank you for the clarification! I’ll keep an eye out for future updates regarding the move to restricted-v2.