Open razvan opened 3 months ago
Will this issue also address the hard-coded UID, as mentioned in https://github.com/stackabletech/issues/issues/607?
pub const NIFI_UID: i64 = 1000;
@razvan will the fix for this issue be included in release 24.11?
@razvan will the fix for this issue be included in release 24.11?
@PaulienVa I'm a bit skeptical since it has not been prioritized yet. But the roadmap is not fully closed yet so there might be a chance. If it does make it, we'll update the issue (labels).
We are really really really blocked by it, so if by any chance that will weight in the prio's I'll be happy if it could be included in the release :)
@PaulienVa just to be on the same page: You are aware that you can already set a custom securityContext
using podOverrides?
My understanding is that this issue is about (optionally) making it a bit nicer to configure by placing the field a bit higher in the hierarchy.
One working example is
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
name: simple-zk
spec:
image:
productVersion: 3.9.2
servers:
podOverrides:
spec:
securityContext:
runAsUser: 1234
runAsGroup: 5678
fsGroup: 9876
roleGroups:
default:
replicas: 1
@sbernauer we have worked on implementing this workaround. But plainly using this overrides resulted in the following error:
$ oc logs pyspark-pis-0d9503922eb6c50b-driver
/stackable/spark/kubernetes/dockerfiles/spark/entrypoint.sh: line 44: java_opts.txt: Permission denied
So we have built a custom image in the following way:
FROM docker.stackable.tech/stackable/spark-k8s:3.5.1-stackable24.7.0
USER root
WORKDIR /stackable
RUN chgrp -R 0 /stackable && chmod -R g+rwX /stackable
USER stackable
WORKDIR /stackable
The stackable user then gets the correct permissions to run.
Is this something you can implement by default in the stackable images?
Hi @PaulienVa, glad to hear the podOverrides worked! @lfrancke is doing something related to groups and permissions in https://github.com/stackabletech/docker-images/pull/849, maybe he can give some details if that PR would solve your problem
Thanks for pinging me.
Yes, we have a slow running initiative to entirely revamp the whole user handling in our docker images. The issue that Sebastian linked to contains a whole lot of details. For your specific case we are going to change the group of all files to "0" (the root group) just like you do in your custom image.
In other words: In the next release (24.11 - due in November) you should not need your custom image anymore.
We will not achieve full securityContext configurability yet and the user id will still be hardcoded to 1000
for the next release. I currently hope to lift that restriction in one of the next three releases. It's a longer process.
@PaulienVa Can you tell us what exactly it is you need to configure? That'd help me find more corner cases and issues we might not have thought about so far.
@lfrancke for us it is important that the hardcoded user id will not be hardcoded anymore. When configuring nothing and using the default setup we see that the pods are running in the non-root SCC, however we want the pods to run in the restricted-v2 SCC. We therefore use the podOverrides as @sbernauer suggested here. We used a user that is dedicated to the project that we retrieve like this:
oc get project <PROJECT> -o jsonpath='{.metadata.annotations.openshift\.io/sa\.scc\.uid-range}'
if we could get rid of this podOverrides section it would be way cleaner.
Thanks for the clarification.
Understood. So, in 24.7 we moved away from our custom SCC to nonroot-v2 but as you said restricted-v2
doesn't work just yet. I am afraid that this will still be the case in 24.11 but we should have all the machinery in place to make it possible to then do this for 25.3 or 25.7. I know that's still some way off but we need to test this carefully and it's too close to 24.11 for us to slip this in now.
But it is a project we're working on and as I said above: The first benefit is going to be that you should be able to get rid of your custom image in 24.11.
Thanks for the update! @lfrancke Just to clarify, does this change also apply to the Airflow Operator? Currently, the Airflow Operator uses airflow-scc. Will the transition to nonroot-v2 (and eventually restricted-v2) affect or replace the airflow-scc as well?
As of 24.7 the Airflow Operator should also be using nonroot-v2
unless I'm mistaken. Do you see something different?
There is one caveat: The OLM package for 24.7 will be using nonroot-v2
but the Helm chart was only fixed after the release, that slipped through the cracks. So, if you're using Helm you'll still have the custom one but should be able to change it and as of 24.11 this will also move to nonroot-v2
.
https://github.com/stackabletech/issues/issues/624
So, yes. Airflow Operator will also be moving to restricted-v2
.
Thanks for the clarification! One more question: Would it be possible for us to use a night build until the new version is officially released?
Yes, that's possible. But so far, this change has not been merged! So, at the moment this will not help you.
You can follow this issue: https://github.com/stackabletech/issues/issues/645 or this pull request https://github.com/stackabletech/docker-images/pull/849
Hi @lfrancke
I noticed that the Airflow development images are receiving the correct ID, for which I thank you. However, it seems that the Spark development images are not being applied correctly. Could you please confirm if I am understanding this correctly?
Thank you!
Hi, yes, that's correct. The first PR landed. I decided to split it in multiple parts because it caused merge conflicts all the time. I'll try to put up the second PR for the remaining images this week.
Hi @lfrancke,
Thanks for the update! Could you please let me know once the second PR is created, or point me to where I can follow its progress?
I appreciate your help!
I just put up the draft PR so you can follow it already: https://github.com/stackabletech/docker-images/pull/890 At the moment it only includes Hive & Kafka but I'm working my way through the list.
Hi @lfrancke,
Great, thanks for sharing the draft PR! I'll follow it and I’ll keep an eye on the updates as you work through the rest of the list.
Appreciate the effort!
FYI the PR with the changes for the Spark image has been merged. The dev image should now include that change. Let me know if you find any issues.
The remaining products will be done in a last PR: https://github.com/stackabletech/docker-images/pull/897
Thank you! I’ve tested it, and it works with the following workaround:
podOverrides: spec: securityContext: runAsUser: <openshiftprojectuid-range> fsGroup: <openshiftprojectuid-range>
When do you think this could be incorporated? I really appreciate your hard work!
This will not be in our next release I'm afraid. We're slowly working our way through the overarching issue https://github.com/stackabletech/issues/issues/645 but we're close to the next release (24.11) and don't want to introduce any major changes this close to the release. It'd mean that we don't have much time to test these changes.
I'm afraid you'll have to live with the overrides for now. I hope we can get more changes in our release after the next which would be in March 2025 (current plan).
Thank you for the update! I completely understand the need to avoid major changes this close to the release. I’ll continue using the overrides in the meantime. I’m looking forward to the March 2025 release and hope we can address more of these issues then.
Also, could you confirm if the following change will be included in the next update? nonroot-v2 to restricted-v2 in clusterrole via OLM and helm `
The next release (24.11) will use nonroot-v2
everywhere.
We want to move to restricted-v2
in a future release.
restricted-v2
will already work with some operators, but some have hardcoded usernames (we do not have a list yet) and we need to fix that.
Thank you for the clarification! I’ll keep an eye out for future updates regarding the move to restricted-v2.
Description
Stackable operators use hard-coded pod security contexts as noted in this issue.
These are rather arbitrary and mostly useless in regular OpenShift deployments.
They can be overridden using pod overrides but the use of pod overrides is usually discouraged.
Value
Strengthen security while providing a simple(er) configuration interface for sec contexts.
Tasks
fs_group
is the only one needed.podSecurityContext
fields to cluster and role configurations.containerSecurityContext
fields to cluster and role configurations.