Closed shalberd closed 1 year ago
yes, it should work. Can you provide the exact steps(oc commands, in order) you are taking along w/ your yaml files for the imagestream and statefulset?
Due to some limitations in the admission logic, i think it's critical that your imagestream exist and be successfully imported before you create the statefulset. At least depending what mechanism you're using (which will be clearer if we can see your statefulset yaml and imagestream).
i'm guessing you've already been here, but here are the docs on these features: https://docs.openshift.com/container-platform/4.11/openshift_images/using-imagestreams-with-kube-resources.html
Hi Ben, thank you for grabbing this topic, I really appreciate it.
Imagestream is applied by a kustomize-style yaml files deployment. Statefulset is created by Kubeflow Notebooks Controller as a result of a Notebook CR being present.
The imagestream exists and tag has been successfully imported at created: '2022-11-21T13:14:31Z'. We do not use an the internal openshift docker registry.
The statefulset references imagestream name and tag in its container image field. Its creation timestamp is way past the imagestream one creationTimestamp: '2022-12-07T16:54:40Z'
We are getting the following events on the pod, which is why I got to the bugzilla link, access.redhat link and so on in the first message of this issue here.
[Warning] Failed to pull image "s2i-generic-data-science-notebook:v0.0.5": rpc error: code = Unknown desc = reading manifest v0.0.5 in docker.io/library/s2i-generic-data-science-notebook: errors: denied: requested access to the resource is denied unauthorized: authentication required
jupyter-nb-sthoms-0
Namespace odhsven
Dec 8, 2022, 5:21 AM
Generated from kubelet xxxx-ostlxxx.ocp4dev.xxxx.mydomain.ch
138 times in the last 14 hours
Failed to pull image "s2i-generic-data-science-notebook:v0.0.5": rpc error: code = Unknown desc = reading manifest v0.0.5 in docker.io/library/s2i-generic-data-science-notebook: errors: denied: requested access to the resource is denied unauthorized: authentication required
I am referencing the yamls below. The problem occurs on OCP 4.10.3 and also on the latest DEV versions 4.13.x. Imagestream and StatefulSet are in the same namespace. Imagestream, when referenced by DeploymentConfigs, Deployments, and Pods in the image-field, resolves to the src location of the imagestream-tag. In those types of objects, the resolve and image pull through is working fine, just not with StatefulSet.
There are two containers in the pod, disregard the oauth-one, please, we are trying to instead use the central openshift imagestream for that some day, too :-)
kind: ImageStream
apiVersion: image.openshift.io/v1
metadata:
annotations:
kfctl.kubeflow.io/kfdef-instance: opendatahub.odhsven
opendatahub.io/notebook-image-desc: >-
Jupyter notebook image with a set of data science libraries that advanced
AI/ML notebooks will use as a base image to provide a standard for
libraries avialable in all notebooks
opendatahub.io/notebook-image-name: Standard Data Science
opendatahub.io/notebook-image-url: 'https://github.com/thoth-station/s2i-generic-data-science-notebook'
openshift.io/image.dockerRepositoryCheck: '2022-11-21T13:14:31Z'
resourceVersion: '804058030'
name: s2i-generic-data-science-notebook
uid: 9be25400-03ae-4672-9e4d-06e6d31ebd07
creationTimestamp: '2022-11-21T13:14:30Z'
generation: 2
namespace: odhsven
labels:
app.kubernetes.io/part-of: jupyterhub
component.opendatahub.io/name: jupyterhub
opendatahub.io/component: 'true'
opendatahub.io/notebook-image: 'true'
spec:
lookupPolicy:
local: true
tags:
- name: v0.0.5
annotations:
opendatahub.io/notebook-python-dependencies: >-
[{"name":"Boto3","version":"1.17.11"},{"name":"Kafka-Python","version":"2.0.2"},{"name":"Matplotlib","version":"3.4.2"},{"name":"Numpy","version":"1.21.0"},{"name":"Pandas","version":"1.2.5"},{"name":"Scipy","version":"1.7.0"}]
opendatahub.io/notebook-software: '[{"name":"Python","version":"v3.8.6"}]'
openshift.io/imported-from: quay.io/thoth-station/s2i-generic-data-science-notebook
from:
kind: DockerImage
name: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
generation: 2
importPolicy: {}
referencePolicy:
type: Source
status:
dockerImageRepository: ''
tags:
- tag: v0.0.5
items:
- created: '2022-11-21T13:14:31Z'
dockerImageReference: >-
quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
image: >-
sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
generation: 2
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: jupyter-nb-sthoms
namespace: odhsven
uid: f3692652-6718-40fc-909f-355eb22b5002
resourceVersion: '829380751'
generation: 1
creationTimestamp: '2022-12-07T16:54:40Z'
ownerReferences:
- apiVersion: kubeflow.org/v1beta1
kind: Notebook
name: jupyter-nb-sthoms
uid: 15d89396-c670-4c36-b0c9-5872fd040811
controller: true
blockOwnerDeletion: true
spec:
replicas: 1
selector:
matchLabels:
statefulset: jupyter-nb-sthoms
template:
metadata:
creationTimestamp: null
labels:
app: jupyter-nb-sthoms
notebook-name: jupyter-nb-sthoms
opendatahub.io/odh-managed: 'true'
opendatahub.io/user: sthoms
statefulset: jupyter-nb-sthoms
spec:
restartPolicy: Always
serviceAccountName: jupyter-nb-sthoms
schedulerName: default-scheduler
enableServiceLinks: false
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: nvidia.com/gpu.present
operator: NotIn
values:
- 'true'
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- resources:
limits:
cpu: '2'
memory: 8Gi
requests:
cpu: '1'
memory: 8Gi
readinessProbe:
httpGet:
path: /notebook/odhsven/jupyter-nb-sthoms/api
port: notebook-port
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: jupyter-nb-sthoms
livenessProbe:
httpGet:
path: /notebook/odhsven/jupyter-nb-sthoms/api
port: notebook-port
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
env:
- name: NOTEBOOK_ARGS
value: |-
--ServerApp.port=8888
--ServerApp.token=''
--ServerApp.password=''
--ServerApp.base_url=/notebook/odhsven/jupyter-nb-sthoms
--ServerApp.quit_button=False
--ServerApp.tornado_settings={"user":"sthoms","hub_host":"https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch","hub_prefix":"/notebookController/sthoms"}
- name: JUPYTER_IMAGE
value: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
- name: JUPYTER_NOTEBOOK_PORT
value: '8888'
- name: NB_PREFIX
value: /notebook/odhsven/jupyter-nb-sthoms
ports:
- name: notebook-port
containerPort: 8888
protocol: TCP
imagePullPolicy: Always
volumeMounts:
- name: jupyterhub-nb-sthoms-pvc
mountPath: /opt/app-root/src
terminationMessagePolicy: File
image: 's2i-generic-data-science-notebook:v0.0.5'
workingDir: /opt/app-root/src
- resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 100m
memory: 64Mi
readinessProbe:
httpGet:
path: /oauth/healthz
port: oauth-proxy
scheme: HTTPS
initialDelaySeconds: 5
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: oauth-proxy
livenessProbe:
httpGet:
path: /oauth/healthz
port: oauth-proxy
scheme: HTTPS
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
ports:
- name: oauth-proxy
containerPort: 8443
protocol: TCP
imagePullPolicy: Always
volumeMounts:
- name: oauth-config
mountPath: /etc/oauth/config
- name: tls-certificates
mountPath: /etc/tls/private
terminationMessagePolicy: File
image: 'registry.redhat.io/openshift4/ose-oauth-proxy:v4.10'
args:
- '--provider=openshift'
- '--https-address=:8443'
- '--http-address='
- '--openshift-service-account=jupyter-nb-sthoms'
- '--cookie-secret-file=/etc/oauth/config/cookie_secret'
- '--cookie-expire=24h0m0s'
- '--tls-cert=/etc/tls/private/tls.crt'
- '--tls-key=/etc/tls/private/tls.key'
- '--upstream=http://localhost:8888'
- '--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
- >-
--skip-auth-regex=^(?:/notebook/$(NAMESPACE)/jupyter-nb-sthoms)?/api$
- '--email-domain=*'
- '--skip-provider-button'
- >-
--openshift-sar={"verb":"get","resource":"notebooks","resourceAPIGroup":"kubeflow.org","resourceName":"jupyter-nb-sthoms","namespace":"$(NAMESPACE)"}
- >-
--logout-url=https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch/notebookController/sthoms/home
serviceAccount: jupyter-nb-sthoms
volumes:
- name: jupyterhub-nb-sthoms-pvc
persistentVolumeClaim:
claimName: jupyterhub-nb-sthoms-pvc
- name: oauth-config
secret:
secretName: jupyter-nb-sthoms-oauth-config
defaultMode: 420
- name: tls-certificates
secret:
secretName: jupyter-nb-sthoms-tls
defaultMode: 420
dnsPolicy: ClusterFirst
serviceName: ''
podManagementPolicy: OrderedReady
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
revisionHistoryLimit: 10
status:
observedGeneration: 1
replicas: 1
currentReplicas: 1
updatedReplicas: 1
currentRevision: jupyter-nb-sthoms-8578f7d994
updateRevision: jupyter-nb-sthoms-8578f7d994
collisionCount: 0
availableReplicas: 0
It does not really matter, too, if it is that particular StatefulSet or one created ad-hoc by me, the behavior and bug is always the same. This is problematic because KF Notebook Controller instantiation of pods is based on StatefuSet, not Deployment or DeploymentConfig or Pod directly.
I attempted to create this behavior and could not. Here's my version of the yamls (just stripped the status fields and the fields that are populated at creation time, and stripped the NS):
kind: ImageStream
apiVersion: image.openshift.io/v1
metadata:
annotations:
kfctl.kubeflow.io/kfdef-instance: opendatahub.odhsven
opendatahub.io/notebook-image-desc: >-
Jupyter notebook image with a set of data science libraries that advanced
AI/ML notebooks will use as a base image to provide a standard for
libraries avialable in all notebooks
opendatahub.io/notebook-image-name: Standard Data Science
opendatahub.io/notebook-image-url: 'https://github.com/thoth-station/s2i-generic-data-science-notebook'
name: s2i-generic-data-science-notebook
labels:
app.kubernetes.io/part-of: jupyterhub
component.opendatahub.io/name: jupyterhub
opendatahub.io/component: 'true'
opendatahub.io/notebook-image: 'true'
spec:
lookupPolicy:
local: true
tags:
- name: v0.0.5
annotations:
opendatahub.io/notebook-python-dependencies: >-
[{"name":"Boto3","version":"1.17.11"},{"name":"Kafka-Python","version":"2.0.2"},{"name":"Matplotlib","version":"3.4.2"},{"name":"Numpy","version":"1.21.0"},{"name":"Pandas","version":"1.2.5"},{"name":"Scipy","version":"1.7.0"}]
opendatahub.io/notebook-software: '[{"name":"Python","version":"v3.8.6"}]'
openshift.io/imported-from: quay.io/thoth-station/s2i-generic-data-science-notebook
from:
kind: DockerImage
name: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
generation: 2
importPolicy: {}
referencePolicy:
type: Source
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: jupyter-nb-sthoms
spec:
replicas: 1
selector:
matchLabels:
statefulset: jupyter-nb-sthoms
template:
metadata:
creationTimestamp: null
labels:
app: jupyter-nb-sthoms
notebook-name: jupyter-nb-sthoms
opendatahub.io/odh-managed: 'true'
opendatahub.io/user: sthoms
statefulset: jupyter-nb-sthoms
spec:
restartPolicy: Always
serviceAccountName: jupyter-nb-sthoms
schedulerName: default-scheduler
enableServiceLinks: false
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: nvidia.com/gpu.present
operator: NotIn
values:
- 'true'
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- resources:
limits:
cpu: '2'
memory: 8Gi
requests:
cpu: '1'
memory: 8Gi
readinessProbe:
httpGet:
path: /notebook/odhsven/jupyter-nb-sthoms/api
port: notebook-port
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: jupyter-nb-sthoms
livenessProbe:
httpGet:
path: /notebook/odhsven/jupyter-nb-sthoms/api
port: notebook-port
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
env:
- name: NOTEBOOK_ARGS
value: |-
--ServerApp.port=8888
--ServerApp.token=''
--ServerApp.password=''
--ServerApp.base_url=/notebook/odhsven/jupyter-nb-sthoms
--ServerApp.quit_button=False
--ServerApp.tornado_settings={"user":"sthoms","hub_host":"https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch","hub_prefix":"/notebookController/sthoms"}
- name: JUPYTER_IMAGE
value: 'quay.io/thoth-station/s2i-generic-data-science-notebook:v0.0.5'
- name: JUPYTER_NOTEBOOK_PORT
value: '8888'
- name: NB_PREFIX
value: /notebook/odhsven/jupyter-nb-sthoms
ports:
- name: notebook-port
containerPort: 8888
protocol: TCP
imagePullPolicy: Always
volumeMounts:
- name: jupyterhub-nb-sthoms-pvc
mountPath: /opt/app-root/src
terminationMessagePolicy: File
image: 's2i-generic-data-science-notebook:v0.0.5'
workingDir: /opt/app-root/src
- resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 100m
memory: 64Mi
readinessProbe:
httpGet:
path: /oauth/healthz
port: oauth-proxy
scheme: HTTPS
initialDelaySeconds: 5
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: oauth-proxy
livenessProbe:
httpGet:
path: /oauth/healthz
port: oauth-proxy
scheme: HTTPS
initialDelaySeconds: 30
timeoutSeconds: 1
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
env:
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
ports:
- name: oauth-proxy
containerPort: 8443
protocol: TCP
imagePullPolicy: Always
volumeMounts:
- name: oauth-config
mountPath: /etc/oauth/config
- name: tls-certificates
mountPath: /etc/tls/private
terminationMessagePolicy: File
image: 'registry.redhat.io/openshift4/ose-oauth-proxy:v4.10'
args:
- '--provider=openshift'
- '--https-address=:8443'
- '--http-address='
- '--openshift-service-account=jupyter-nb-sthoms'
- '--cookie-secret-file=/etc/oauth/config/cookie_secret'
- '--cookie-expire=24h0m0s'
- '--tls-cert=/etc/tls/private/tls.crt'
- '--tls-key=/etc/tls/private/tls.key'
- '--upstream=http://localhost:8888'
- '--upstream-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
- >-
--skip-auth-regex=^(?:/notebook/$(NAMESPACE)/jupyter-nb-sthoms)?/api$
- '--email-domain=*'
- '--skip-provider-button'
- >-
--openshift-sar={"verb":"get","resource":"notebooks","resourceAPIGroup":"kubeflow.org","resourceName":"jupyter-nb-sthoms","namespace":"$(NAMESPACE)"}
- >-
--logout-url=https://odh-dashboard-odhsven.apps.ocp4dev.xxxx.mydomain.ch/notebookController/sthoms/home
serviceAccount: jupyter-nb-sthoms
volumes:
- name: jupyterhub-nb-sthoms-pvc
persistentVolumeClaim:
claimName: jupyterhub-nb-sthoms-pvc
- name: oauth-config
secret:
secretName: jupyter-nb-sthoms-oauth-config
defaultMode: 420
- name: tls-certificates
secret:
secretName: jupyter-nb-sthoms-tls
defaultMode: 420
dnsPolicy: ClusterFirst
serviceName: ''
podManagementPolicy: OrderedReady
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
revisionHistoryLimit: 10
status:
observedGeneration: 1
replicas: 1
currentReplicas: 1
updatedReplicas: 1
currentRevision: jupyter-nb-sthoms-8578f7d994
updateRevision: jupyter-nb-sthoms-8578f7d994
collisionCount: 0
availableReplicas: 0
I ran:
# create imagestream
oc create -f is.yaml
then i ran
# check import was successful
oc get is -o yaml
to confirm the import, which i did by checking the status section of the output:
status:
dockerImageRepository: image-registry.openshift-image-registry.svc:5000/p1/s2i-generic-data-science-notebook
tags:
- items:
- created: "2022-12-08T15:31:49Z"
dockerImageReference: quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
generation: 2
image: sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
tag: v0.0.5
# create stateful set
oc create -f ss.yaml
and then i confirmed that the image field is substituted as expected (this happens during creation admission):
oc get statefulset -o yaml | grep image:
image: quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
image: registry.redhat.io/openshift4/ose-oauth-proxy:v4.10
No pods ended up being created and i didn't really look into why, but fundamentally the statefulset's pod template is as expected: the image field points to the resolved imagestreamtag value and not 's2i-generic-data-science-notebook:v0.0.5' as was in the original statefulset yaml. (your pod's "failure to pull the image" error are directly caused by the fact that the image field in the statefulset was not substituted correctly when the statefulset was created)
The OCP version i used was one of our 4.13 CI builds: 4.13.0-0.ci-2022-12-07-190258 but this behavior hasn't been modified in several releases as far as i know (since the bug you pointed to in 4.6 i believe)
i'm not really sure what to tell you since i used pretty much your exact yaml files to attempt the recreate and it seems to be working as expected.
What OCP version are you on?
I am on OCP 4.10.3 on-prem without openshift registry and on 4.13 DEV on AWS with openshift registry.
So hold on, I am probably doing circular reasoning, should work more along the lines of Occam. You analysis helps for now. @bparees Do not go any further for now, thank you.
The differentiating feature I have is the KF Notebook and ODH Notebook Controller, that watches for Notebook CRs and then creates StatefulSets from them. I think I until now only manually modified the existing, KF or ODH Notebook controller-created StatefulSet / Notebook CR. Will try with a brand new one without any annotations from notebook controller.
What bugs me is that the substitution you mention works for everything except a StatefulSet, which is really pretty weird. The substitution in my case is image: quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
, in DeploymentConfigs, Pods, Deployments, just not in StatefulSets .... But, that makes sense, as I think I only edited StatefulSet objects created by the notebook controller with the image field pointing to the imagestream. I think the KF and ODH Notebook Controller is keeping the image field in the StatefulSet from being updated.
The differentiating feature I have is the KF Notebook and ODH Notebook Controller, that watches for Notebook CRs and then creates StatefulSets from them. I think I until now only manually modified the existing, KF or ODH Notebook controller-created StatefulSet. Will try with a brand new one without any annotations from notebook controller.
ah hah. Yes, if you have some controller that is reconciling the statefulset, it is very likely that the substituted value is being stomped back by the controller, after the admission logic has mutated it.
What bugs me is that the substitution you mention works for everything except a StatefulSet, which is really pretty weird.
assuming those other resources don't have a controller reconciling them, that would explain the difference. The substitution is probably working, but then immediately getting stomped/reverted by the notebook controller.
Try creating a statefulset directly and i think you'll see it work correctly.
I think the KF and ODH Notebook Controller is keeping the image field in the StatefulSet from being updated.
more accurately, it is probably reverting the change after it is made, but yes, fundamentally it's a case of multiple writers competing over the resource.
Will look more into that tomorrow, but we are on to something, yes.
hah, indeed. Turns out with a fresh StatefulSet, without any notebook controller annotations, the image name and digest is resolved correctly from the imagestream tag and name.
So yes, the Notebook controller Kubeflow Notebook CR owns StatefulSets created from a Notebook def instance and thus interferes with or overrides imagestream resolve on admission of the StatefulSet image resolve to the cluster.
ownerReferences:
- apiVersion: kubeflow.org/v1beta1
kind: Notebook
controller: true
blockOwnerDeletion: true
@kpouget @DaoDaoNoCode seems the notebook controller overrides the value of the container.image field as resolved before by the Openshift image policy admission plugin/controller from the imagestream tag and name. As far as notebook controller goes, reconciling logic and StatefulSet logic is here, it seems:
Apparently, there is a kubeflow reconcilehelper package and the reconciliation logic is here (update-logic)
The whole StatefulSet template spec, of which the image-field is a part of, is continually updated, thus overwriting any changes from the image policy admission plug-in:
Anyway, the image policy admission plugin itself is working as expected on the Kubernetes / Openshift side for StatefulSets not owned by KF Notebook controller. Here is the result with a fresh StatefulSet, working as expected:
Statefulset before apply, non-notebook-controller-related, referencing imagestream name and tag in same namespace:
piVersion: apps/v1
kind: StatefulSet
metadata:
name: testwithoutnotebookannotation
namespace: odhsven
spec:
serviceName: testwithoutnotebookannotation
replicas: 1
selector:
matchLabels:
app: testwithoutnotebookannotation
template:
metadata:
labels:
app: testwithoutnotebookannotation
spec:
terminationGracePeriodSeconds: 10
containers:
- name: jupyter
image: 's2i-generic-data-science-notebook:v0.0.5'
Result working as expected, excerpt from the StatefulSet container spec after applied in cluster:
labels:
app: testwithoutnotebookannotation
spec:
containers:
- name: jupyter
image: >-
quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58
With this fresh StatefulSet and the pod created from it, all is working fine afterwards. Image-pull is working as expected, since nothing else was defined, image pull policy ifNotPresent (on node) kicks in, the image had been present on my worker node before, all good:
Container image "quay.io/thoth-station/s2i-generic-data-science-notebook@sha256:3f619c61501218f03d39d97541336dee024f446e64f3a47e2bc7e62cddeb2e58" already present on machine
Looks like I need to escalate this error with the ODH Notebook Controller team. Leaving ticket open for now, but thank you for all the help. I also created a ticket at kubeflow explaining the context https://github.com/kubeflow/kubeflow/issues/6829
@bparees We have had some issues on a hyperscaler-based Openshift 4.11 and 4.13.0-ec.0 cluster with the image policy admission plugin / metadata annotation doing nothing. Where in Openshift's config is it defined which plugins are active and which are not? For Openshift 3.10, I found some hints, but cannot find documentation about that cluster-specific stuff for 4.10 or later anywhere.
it's not configurable in 4.x.
weird, thank you. Could PodSecurityPolicy be an issue potentially interfering with image-resolve? Context: fresh StatefulSet on a hyperscaler cluster, annotation in metadata
annotations:
[image.openshift.io/triggers](http://image.openshift.io/triggers): >-
[{"from":{"kind":"ImageStreamTag","name":"s2i-generic-data-science-notebook:v0.0.5","namespace":"odhsven"},"fieldPath":"spec.template.spec.containers[?(@.name==\"nginx\")].image"}]
on 4.10.37 in-house, it worked. On the hyperscalers with 4.11 and 4.13, it did not work. The image-field did not get updated. We carefully made sure to reference the correct container name, imagestream name and tag, all that.
I wonder if something keeps changing in the backend. For example, on 4.10.3 in-house, I had to either have
image: " "
or no image-field specified initially in the containers section of the StatefulSet for lookup to work. Now, on 4.10.37 in-house, I can use whatever value I want for the image-field, and it gets updated accordingly.
does the same annotation work on a deployment or any other resource type?
I don't think PSP could be interfering. The way this is supposed to work, iirc is:
1) an imagechangetrigger controller watches all the workload(deployment, statefulset, daemonset, etc) resources that have the imagetrigger annotation. Basically anything that contains a PodTemplate 2) it also watches all the imagestreams that are ref'd by the annotations on those workload resources 3) when the imagestream changes (new image tag SHA), it updates the workload resource (using the field path indicated by the annotation) 4) assuming the annotation is specified correctly, once it updates, the workload resource controller (deployment controller, statefulset controller, etc) will see that the pod template has changed (image field value change) and rollout new pods for the workload resource
I don't think the initial value of the image field should matter, though you should be aware that an initial rollout of pods may occur w/ the initial value of the image field, before the imagechangetrigger observes the resources and updates them (which will then cause a subsequent rollout)
Yes, the annotation works on Pods, for example, on my in-house OCP 4.10.3 and 4.10.37 clusters. Just not on the hyperscalers 4.11 and 4.13.
That one-space-string or no-field-necessary issue for image-value might have been something specific to 4.10.3.
Now, on 4.10.37, I can have any string I want as the image-value, the actual value from the imagestream gets inserted / replaced. Once I apply the definition, the image-field gets updated accordingly.
Also, the imagestream is initialized and applied, no changes to the images referenced by the imagestream. So, in that way, that substitution of the image-field does not even necessitate an image-change when applying to a new Pod, Deployment, StatefulSet.
You may have hit a HyperShift bug - https://issues.redhat.com/browse/OCPBUGS-4025
hyper-cool you found that, Oleg, děkuji / merci vielmal
@shalberd these are hypershift guest clusters where you're seeing the issue? When you said hyperscalers i just thought you meant a traditional ocp cluster running on AWS/GCP/Azure.
It is on aws *aws-2.ci.openshift.org .... For more @andrewballantyne from Open Data Hub (ODH) team can answer. I am not a member of Open Data Hub, but pushing the use of imagestream abstractions, non-dependence on openshift-internal docker registry, ImageContentSourcePolicy support from an enterprise perspective, very on-prem OCP oriented, but making sure to test extensively with the ODH team.
Why use hypershift with AWS and Azure anyways? I mean, does that not add an unneccessary layer of abstraction, with reconciliations interfering there, too. Just interested ...
@bparees Regarding collisions between the image policy admission plugin / imagechangetrigger writing to the image-field and the notebook controller overwriting it on reconcilation, what is the conceptually best way to avoid overwriting the image-field by KF notebook controller? Is it
a) excluding the image-field from reconciliation, for example, or is it instead to b) let the controller handle getting the image-value from the imagestream itself, similar to this approach for the oauth-image in another controller?
To me, the approach that the Elastic operator took looks like duplicating logic, but I am very much new to the topic of operators and object reconciliations. Also, since imagestream is specific to openshift and not always is the image-field of a notebook CR content based an a metadata annotation, this being a speciality case on openshift, I'd go towards approach a.
I think approach (a) is reasonable in this case, but keep in mind that it means that the kf notebook controller now loses the ability to update the image to a new value (such as it might want to do during an upgrade). Unless your plan is that the resource would always be defined with the imagetrigger annotation, in which case upgrades would be handled by updating the imagestream itself and letting the trigger flow through to rollout the workload resource. But now you've tied your logic to openshift behavior (the imagetrigger annotation and imagestream resource, which aren't things that exist on vanilla k8s)
So you need to fully think through who should be controlling the value of the image field, and where this operator/controller will be used.
One way to approach it would be for the kf controller to manage the image field value if, and only if, there is no image.openshift.io/triggers
annotation on the resource. The presence of that annotation would be an indication that the user has chosen an alternate way to manage the image field value, and the kf notebook controller should defer to that alternate source of image values. Unfortunately this means adding "openshift specific/aware" logic to the controller, which i understand might be undesirable, but the good news is the logic would be completely portable to vanilla k8s (since it's just looking at annotation keys)
And then if there is not such an annotation, the kf notebook controller can directly set the image field, allowing it to update the image field to new values during upgrades of the kf notebook controller.
This would allow a user to update the workload resource to be imagetriggered based on an imagestream of their choice, when they want that, but if they didn't want it (or weren't on openshift), they'd get whatever standard image the kf notebook controller specifies/defaults.
Again, ultimately this really comes down to a philosophy of how the image value should be managed, and by whom. Is it a default provided by the notebook, but intended to be modified/controlled by the user? Or is it something that's intended to be tightly specified/controlled by the notebook? That will guide you as to how best to implement the behavior and what controls to offer users.
(e.g. another option would be to introduce an entirely new annotation that tells the controller whether or not to manage the image field. Then users could disable controller management of the image field for both that case of "i want the imagetrigger to manage it" but also "i just want to manually set it to something else for some reason")
btw, serverside apply
may offer another approach to this problem, by allowing controllers to assert ownership/priority of specific fields within a resource:
https://kubernetes.io/docs/reference/using-api/server-side-apply/
I suspect that would first require updates to the imagechangetrigger logic to respect those semantics (in addition to the kf notebook controller supporting them), but since serverside apply is k8s' long term strategic answer to these sorts of conflicts, i'd be remiss in not at least mentioning it.
Thanks a lot, that helps me get started, I really appreciate your input. Here's to a good new year.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
@dmage @benluddy @bparees I hope I am at the correct place here, found this online for context:
When I have an imagestream in the same namespace as a StatefulSet, I cannot use the image field and reference imagestream name and tag there. It does not get resolved.
In contrast, when I reference the imagestream from a DeploymentConfig, or from a Pod, or from a Deployment, the image name and digest get resolved correctly, as it should be.
Should this not have been fixed in the admission controller or something like that a long time ago, for all kinds of objects?
See https://bugzilla.redhat.com/show_bug.cgi?id=2000216 and https://access.redhat.com/solutions/6455191
In any case, the expected result would be: image stream tags get resolved in created and edited StatefulSets.