Closed juergenhemelt closed 2 years ago
I am running the chart in OpenShift regularly.
I am using OpenShift 4, but do have my own dev cluster. It looks as if we have a security limitation here.
I can try and respond here, but if you are ok with it, slack (slack.odpi.org) may be a more interactive approach to work through some ideas. I'd very much like to help you get it working and improve our docs or charts accordingly.
To elaborate on jupyter in particular - we reuse the base jupyter image (jupyter/base-notebook:latest on dockerhub) and just add a file python modules & load up some notebooks.
That base container is defined at https://github.com/jupyter/docker-stacks/tree/master/base-notebook and there are some docs at https://jupyter-docker-stacks.readthedocs.io/en/latest/index.html
The container does indeed use group=100
So can you tell me anything about the security constraints you have in your environment -- as the error indicates 100 is not allowed ?
It's not something I've investigated in my environment, however openshift can be secured to only permit certain groups.
This can be modified (by a cluster admin) with:
oc edit scc restricted
Or specifically
oc adm policy add-scc-to-group restricted 100
However I don't know if you may hit any other restrictions.
There some docs on SCC strategies (openshift 4) at https://docs.openshift.com/container-platform/4.3/authentication/managing-security-context-constraints.html#authorization-SCC-strategies_configuring-internal-oauth
In my test environment I have:
fsGroup:
type: MustRunAs
In the Deployment yaml spec for our jupyter container I define:
spec:
securityContext:
fsGroup: 100
which I think all hangs together and should work.
So need to understand more about your environment, restrictions, versions
I am still on Openshift 3.11
My restricted scc shows this for fsGroup:
fsGroup:
type: MustRunAs
No ranges defined.
Ok. I found it. My namespace had a default value for openshift.io/sa.scc.supplemental-groups
:
apiVersion: project.openshift.io/v1
kind: Project
metadata:
annotations:
openshift.io/description: ""
openshift.io/display-name: ""
openshift.io/requester: xce3579
openshift.io/sa.scc.mcs: s0:c73,c32
openshift.io/sa.scc.supplemental-groups: 1005320000/10000
So I changed the template/jupyter.yaml: fsGroup: 1005320000
and it works.
Still not fixed. The container crashes now with denied access:
Fail to get yarn configuration. {"type":"error","data":"Could not write file \"/opt/conda/lib/python3.7/site-packages/jupyterlab/yarn-error.log\": \"EACCES: permission denied, open '/opt/conda/lib/python3.7/site-packages/jupyterlab/yarn-error.log'\""}
{"type":"error","data":"An unexpected error occurred: \"EACCES: permission denied, scandir '/home/jovyan/.config/yarn/link'\"."}
{"type":"info","data":"Visit https://yarnpkg.com/en/docs/cli/config for documentation about this command."}
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/traitlets/traitlets.py", line 528, in get
value = obj._trait_values[self.name]
KeyError: 'runtime_dir'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/jupyter-lab", line 10, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/jupyter_core/application.py", line 270, in launch_instance
return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py", line 663, in launch_instance
app.initialize(argv)
File "</opt/conda/lib/python3.7/site-packages/decorator.py:decorator-gen-7>", line 2, in initialize
File "/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py", line 87, in catch_config_error
return method(app, *args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/notebook/notebookapp.py", line 1766, in initialize
self.init_configurables()
File "/opt/conda/lib/python3.7/site-packages/notebook/notebookapp.py", line 1380, in init_configurables
connection_dir=self.runtime_dir,
File "/opt/conda/lib/python3.7/site-packages/traitlets/traitlets.py", line 556, in __get__
return self.get(obj, cls)
File "/opt/conda/lib/python3.7/site-packages/traitlets/traitlets.py", line 535, in get
value = self._validate(obj, dynamic_default())
File "/opt/conda/lib/python3.7/site-packages/jupyter_core/application.py", line 100, in _runtime_dir_default
ensure_dir_exists(rd, mode=0o700)
File "/opt/conda/lib/python3.7/site-packages/jupyter_core/utils/__init__.py", line 13, in ensure_dir_exists
os.makedirs(path, mode=mode)
File "/opt/conda/lib/python3.7/os.py", line 211, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/conda/lib/python3.7/os.py", line 211, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/conda/lib/python3.7/os.py", line 211, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/opt/conda/lib/python3.7/os.py", line 221, in makedirs
mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/jovyan/.local'
Can you share any more about your security configuration in the openshift 3 cluster? There's some info at https://docs.openshift.com/enterprise/3.0/admin_guide/manage_scc.html
Can you follow through and determine
Within the dev team we've used simpler developer-centric / default configured openshift clusters, as well as having deployed onto minikube, IKS, microk8s, k3s, so I'm unable to determine exactly what setting is causing this issue, though I'd like to help you get it working so we can improve our docs etc .
My openshift environment is v4. I will try v3 also.
There is a specific blog on running jupyter containers on openshift at https://blog.openshift.com/jupyter-openshift-part-2-using-jupyter-project-images/
From the error above it looks as if you have permissions issues writing data within the container
The blog entry documents a very similar error, along with the required steps to ensure the container runs as the jovyan user. Could you try those steps and see if it fixes your problem?
One specific suggestion -- this allows containers to run under any user:
oc adm policy add-scc-to-group anyuid system:authenticated
Obviously this does change the security profile of the platform. There are other alternatives including extending the namespace annotation openshift.io/sa.scc.uid-range
or creating a service account with a dedicated scc
I verified this change on a clean openshift 3 cluster (IBM Cloud). Prior to the change I found the zookeeper image (we use the chart from BitNami) didn't start either with a default user of 1001. A clean openshift 4 cluster worked fine by default - but clearly may fail as security is tightened beyond the defaults.
This is an area we could benefit from documenting in more detail in future. The underlying info is relevant for all environments, though the defaults, and mechanisms of modification are openshift specific - or even cloud provider specific (and interestingly varied in my default installs of openshift 3 & openshift 4)
Therefore I propose to extend the documentation and testing in this area, though this won't be addressed immediately.
I would like to ensure you have enough to keep working (as will anyone else who find this issue report)
Do the links above help?
Thx. The links do actually help. As I am not allowed to do this setting for my own namespace I asked the admins. I will let you know if it works as soon as possible.
Further reports of another issue where it was noted the restricted security context was in use:
In this case the encrypted filestore connector is throwing an error. The code is manipulating permissions - so looks like a security issue
As a workaround the following was added to the common check notebook
def useClearConfigStore(platformURL, adminUserId):
adminCommandURLRoot = platformURL + '/open-metadata/admin-services/users/' + adminUserId
print (" ... switching config store to unencrypted...")
url = adminCommandURLRoot + '/stores/connection'
jsonContentHeader = {'content-type':'application/json'}
clearConfigStore = {
"class": "Connection",
"connectorType": {
"class": "ConnectorType",
"connectorProviderClassName": "org.odpi.openmetadata.adapters.adminservices.configurationstore.file.FileBasedServerConfigStoreProvider"
},
"endpoint": {
"class": "Endpoint",
"address": "omag.server.{0}.config"
}
}
postAndPrintResult(url,json=clearConfigStore, headers=jsonContentHeader)
useClearConfigStore(corePlatformURL, adminUserId)
useClearConfigStore(devPlatformURL, adminUserId)
useClearConfigStore(dataLakePlatformURL, adminUserId)
No errors were then shown in relation to the config connector. However a little later on we get:
it looks to me as if something must be failing silently during the config step ie:
" ...... (POST https://lab-dev:9443/open-metadata/admin-services/users/garygeeke/servers/cocoMDS1/local-repository/mode/in-memory-repository )\n",
" ...... Response: {'class': 'VoidResponse', 'relatedHTTPCode': 200}\n",
Cool - we’ve just configured the in memory repository… then …
" ... configuring the short descriptive name of the metadata stored in this server...\n",
" ...... (POST https://lab-dev:9443/open-metadata/admin-services/users/garygeeke/servers/cocoMDS1/local-repository/metadata-collection-name/Data Lake Catalog )\n",
" ...... Response: {'class': 'VoidResponse', 'relatedHTTPCode': 400, 'exceptionClassName': 'org.odpi.openmetadata.adminservices.ffdc.exception.OMAGConfigurationErrorException', 'actionDescription': 'setLocalMetadataCollectionName', 'exceptionErrorMessage': 'OMAG-ADMIN-400-008 The local repository mode has not been set for OMAG server cocoMDS1', 'exceptionErrorMessageId': 'OMAG-ADMIN-400-008', 'exceptionErrorMessageParameters': ['cocoMDS1'], 'exceptionSystemAction': 'The local repository mode must be enabled before the event mapper connection or repository proxy connection is set. The system is unable to configure the local server.', 'exceptionUserAction': 'The local repository mode is supplied by the caller to the OMAG server. This call to enable the local repository needs to be made before the call to set the event mapper connection or repository proxy connection.'}\n",
" ... configuring the membership of the cohort...\n",
(awkward cut/paste!)
That error is complaining the local repository isn’t enabled!
So somehow (I’ve never seen it) it seems the config isn’t being saved? Maybe the error checking in code isn’t good enough, it surely is down to security within the container.
a kubectl logs & describe doesn't show anything untoward other than the use of the secure context ....
I suspect the issue in this case is writing to storage. So far we have not setup volumes for the lab notebook, but should do so. I suspect this may well address this problem.
For all the issues addressed here the first step is to setup a user/config to use the restricted context, then to write up the docs & add volumes as needed.
Looking at this further: I
oc adm policy add-role-to-user admin <email>
oc edit scc restricted
and set 'runAs' and 'fsGroup' to RunAsAnyIf the scc change is not made, the openshift console will show that kafka, zookeeper, jupyter pods could not be created due to the users being used:
create Pod lab-kafka-0 in StatefulSet lab-kafka failed error: pods "lab-kafka-0" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{1001}: 1001 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1004250000, 1004259999]]
Error creating: pods "lab-odpi-egeria-lab-jupyter-744469ccbb-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{100}: 100 is not an allowed group]
create Pod lab-zookeeper-0 in StatefulSet lab-zookeeper failed error: pods "lab-zookeeper-0" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{1001}: 1001 is not an allowed group spec.containers[0].securityContext.securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1004250000, 1004259999]]
These are both third party components (egeria itself is well behaved).
The range or fixed users can also be set - some references as above.
There were no issues in this environment with config or storage
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 20 days if no further activity occurs. Thank you for your contributions.
The current error on OpenShift 4.8, if the security context is not changed, for the base chart is:
Warning FailedCreate 14s (x13 over 35s) statefulset-controller create Pod egeria-base-platform-0 in StatefulSet egeria-base-platform failed error: pods "egeria-base-platform-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{0}: 0 is not an allowed group, provider "ibm-restricted-scc": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-scc": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-hostpath-scc": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-hostaccess-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "ibm-privileged-scc": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
A reminder of the current change required on scc restricted (or a new policy):
fsGroup:
type: MustRunAs
This must be changed to 'RunAsAny'
And
runAsUser:
type: MustRunAsRange
Similarly must change to RunAsAny
Also worth noting that kafka also failed otherwise with
Warning FailedCreate 4m41s (x17 over 10m) statefulset-controller create Pod base-kafka-0 in StatefulSet base-kafka failed error: pods "base-kafka-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1001}: 1001 is not an allowed group, spec.containers[0].securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1000660000, 1000669999], provider "ibm-restricted-scc": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-scc": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-hostpath-scc": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-hostaccess-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "ibm-privileged-scc": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
and zookeeper with:
Warning FailedCreate 5m16s (x17 over 10m) statefulset-controller create Pod base-zookeeper-0 in StatefulSet base-zookeeper failed error: pods "base-zookeeper-0" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1001}: 1001 is not an allowed group, spec.containers[0].securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1000660000, 1000669999], provider "ibm-restricted-scc": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-scc": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-hostpath-scc": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "ibm-anyuid-hostaccess-scc": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "ibm-privileged-scc": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]
The egeria-base chart now runs with the default, restricted security context. Working on the lab chart (issues with nginx, egeria-ui [based on nginx] and jupyter)
The two main user charts
odpi-egeria-lab
have now been updated so that they will run using the default, 'restricted' security context on openshift. This means they can run as any user
This avoids the need to add additional information about the required security context, and makes the environment more secure to run.
As such I am now closing this issue.
When running the Egeria Dojo yesterday 2022-01-17, it appeared as if the egeria-base chart, and possibly lab, in fact did not work on OpenShift.
Reopening and creating a new cluster to validate
Checked with a clean deployment of 4.8.21_1537 (no changes to any security context)
Both egeria-base & odpi-egeria-lab work fine.
I suspect my failure was user error, either in manually editing security context, or deploying a chart, whilst trying to present and test at the same time!
Closing
I try to get started with ODPi using the descriptions found here: https://egeria.odpi.org/open-metadata-resources/open-metadata-labs/
I installed the lab in my Openshift cluster using helm. On startup of the pod lab-odpi-egeria-lab-jupyter I get the event:
"Error creating: pods "lab-odpi-egeria-lab-jupyter-56f7fb969f-" is forbidden: unable to validate against any security context constraint: [fsGroup: Invalid value: []int64{100}: 100 is not an allowed group]"
and the pod is not starting.
Any suggestions?