openshift / image-registry

OpenShift cluster image registry
Apache License 2.0
45 stars 75 forks source link

IR-489: export storage read-only errors as metrics #412

Closed flavianmissi closed 3 weeks ago

flavianmissi commented 1 month ago

before this change, "read-only file system" errors would show up in the metrics as "UNKNOWN".

this commit inspects the enclosed storage path error and reports a metric reporting the failed operation.


Testing this change

The below steps have been tested against 4.18.0-0.test-2024-09-26-122820-ci-ln-0y1k6pk-latest on AWS.

Set storage to read-only

Easiest way I could find to achieve this is to configure the registry to use a pvc, then edit the image-registry deployment to mount the pvc with readOnly: true.

Set project to openshift-image-registry

oc project openshift-image-registry

Create the PVC

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: image-registry-test-claim
  namespace: openshift-image-registry
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: gp3-csi
  volumeMode: Filesystem
EOF

Configure storage to pvc

oc edit configs.imageregistry/cluster

Update the following fields (fields not listed below should not change):

.spec
  defaultRoute: true
  replicas: 1
  rolloutStrategy: Recreate
  storage:
    pvc:
      claim: image-registry-test-claim

Check PVC status Wait until the PVC status changes to "Bound", which means the image registry pod has successfully mounted it.

oc get pvc image-registry-test-claim -w

Set the registry to unmanaged, and update PVC mount to readOnly

oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"managementState":"Unmanaged"}}'

Edit the image registry deployment:

oc edit deployment image-registry 

Find the image-registry-test-claim volume in .spec.template.spec.volumes and change it to:

      - name: registry-storage
        persistentVolumeClaim:
          claimName: image-registry-test-claim
          readOnly: true

Wait for a new pod, then check that the volume is mounted read-only:

oc get pods <image-registry-pod-name> -ojsonpath='{.spec.volumes[?(@.name=="registry-storage")]}'

Sample output:

{"name":"registry-storage","persistentVolumeClaim":{"claimName":"image-registry-test-claim","readOnly":true}}

Import an image

oc import-image --reference-policy=local hello-world:latest --from=docker.io/library/hello-world:latest --confirm

Pull the image

REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}')
podman login --tls-verify=false $REGISTRY -u unused -p $(oc whoami -t)
podman pull --tls-verify=false ${REGISTRY}/openshift-image-registry/hello-world

Verify the logs for a failure to mirror While the import and pull will work just fine, writing to local storage will fail. This can be verified by grepping the image-registry logs:

oc logs deploy/image-registry|grep 'Background mirroring'

This should output a log entry similar to the following:

time="2024-09-18T13:16:49.077558148Z" level=error msg="Background mirroring failed: error committing to storage: filesystem: mkdir /registry/docker: read-only file system" go.version="go1.22.5 (Red Hat 1.22.5-1.el9) X:strictfipsruntime" http.request.host=default-route-openshift-image-registry.apps.ci-ln-m1stck2-76ef8.origin-ci-int-aws.dev.rhcloud.com http.request.id=06ee4c04-b721-4b7b-9977-e7e99ac3c847 http.request.method=GET http.request.remoteaddr=78.72.22.80 http.request.uri="/v2/openshift-image-registry/hello-world/blobs/sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" http.request.useragent="containers/5.31.1 (github.com/containers/image)" openshift.auth.user="kube:admin" vars.digest="sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" vars.name=openshift-image-registry/hello-world

Verify metric in web console Open the openshift web console, choose Observe -> Metrics on the left-side menu, then filter the metrics by:

imageregistry_storage_errors_total

That should return at least one metric with code READ_ONLY_FILESYSTEM, similar to the below: image

flavianmissi commented 1 month ago

/retitle WIP: IR-489: export storage read-only errors as metrics

openshift-ci-robot commented 1 month ago

@flavianmissi: This pull request references IR-489 which is a valid jira issue.

In response to [this](https://github.com/openshift/image-registry/pull/412): >before this change, "read-only file system" errors would show up in the metrics as "UNKNOWN". > >this commit inspects the enclosed storage path error and reports a metric reporting the failed operation. > >--- > >## Testing this change > >The below steps have been tested against `4.17.0-0.ci.test-2024-09-18-115714-ci-ln-m1stck2-latest` on AWS. > >### Set storage to read-only >Easiest way I could find to achieve this is to configure the registry to use a pvc, then edit the image-registry deployment to mount the pvc with `readOnly: true`. > >**Set project to openshift-image-registry** >``` >oc project openshift-image-registry >``` > >**Create the PVC** >``` >cat << EOF | kubectl apply -f - >apiVersion: v1 >kind: PersistentVolumeClaim >metadata: > name: image-registry-test-claim > namespace: openshift-image-registry >spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 5Gi > storageClassName: gp3-csi > volumeMode: Filesystem >EOF >``` > >**Configure storage to pvc** >``` >oc edit configs.imageregistry/cluster >``` >Update the following fields (fields not listed below should not change): >``` >.spec > defaultRoute: true > replicas: 1 > rolloutStrategy: RollingUpdate > storage: > managementState: Managed > pvc: > claim: image-registry-test-claim >``` > >**Check PVC status** >At this point the PVC should have status "Bound", which means the image registry pod has successfully mounted it. >``` >oc get pvc image-registry-test-claim >``` > >**Set the registry to unmanaged, and update PVC mount to readOnly** >``` >oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"managementState":"Unmanaged"}}' >``` >Edit the image registry deployment: >``` >oc edit deployment image-registry >``` >Find the image-registry-test-claim volume in `.spec.template.spec.volumes` and change it to: >``` > - name: registry-storage > persistentVolumeClaim: > claimName: image-registry-test-claim > readOnly: true >``` >Wait for a new pod, then check that the volume is mounted read-only: >``` >oc get pods -ojsonpath='{.spec.volumes[?(@.name=="registry-storage")]}' >``` >Sample output: >``` >{"name":"registry-storage","persistentVolumeClaim":{"claimName":"image-registry-test-claim","readOnly":true}} >``` > >**Import an image** >``` >oc import-image --reference-policy=local hello-world:latest --from=docker.io/library/hello-world:latest --confirm >``` > >**Pull the image** >``` >REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') >podman login --tls-verify=false $REGISTRY -u unused -p $(oc whoami -t) >podman pull --tls-verify=false ${REGISTRY}/openshift-image-registry/hello-world >``` > >**Verify the logs for a failure to mirror** >While the import and pull will work just fine, writing to local storage will fail. >This can be verified by grepping the image-registry logs: >``` >oc logs deploy/image-registry|grep 'Background mirroring' >``` > >This should output a log entry similar to the following: >``` >time="2024-09-18T13:16:49.077558148Z" level=error msg="Background mirroring failed: error committing to storage: filesystem: mkdir /registry/docker: read-only file system" go.version="go1.22.5 (Red Hat 1.22.5-1.el9) X:strictfipsruntime" http.request.host=default-route-openshift-image-registry.apps.ci-ln-m1stck2-76ef8.origin-ci-int-aws.dev.rhcloud.com http.request.id=06ee4c04-b721-4b7b-9977-e7e99ac3c847 http.request.method=GET http.request.remoteaddr=78.72.22.80 http.request.uri="/v2/openshift-image-registry/hello-world/blobs/sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" http.request.useragent="containers/5.31.1 (github.com/containers/image)" openshift.auth.user="kube:admin" vars.digest="sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" vars.name=openshift-image-registry/hello-world >``` > >**Verify metric in web console** >Open the openshift web console, choose Observe -> Metrics on the left-side menu, then filter the metrics by: >``` >imageregistry_storage_errors_total >``` >That should return at least one metric with code `MKDIR_ERR`, similar to the below: >![image](https://github.com/user-attachments/assets/72b38761-ca8c-4832-8697-45fe68cc9099) > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fimage-registry). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 1 month ago

@flavianmissi: This pull request references IR-489 which is a valid jira issue.

In response to [this](https://github.com/openshift/image-registry/pull/412): >before this change, "read-only file system" errors would show up in the metrics as "UNKNOWN". > >this commit inspects the enclosed storage path error and reports a metric reporting the failed operation. > >--- > >## Testing this change > >The below steps have been tested against `4.17.0-0.ci.test-2024-09-18-115714-ci-ln-m1stck2-latest` on AWS. > >### Set storage to read-only >Easiest way I could find to achieve this is to configure the registry to use a pvc, then edit the image-registry deployment to mount the pvc with `readOnly: true`. > >**Set project to openshift-image-registry** >``` >oc project openshift-image-registry >``` > >**Create the PVC** >``` >cat << EOF | kubectl apply -f - >apiVersion: v1 >kind: PersistentVolumeClaim >metadata: > name: image-registry-test-claim > namespace: openshift-image-registry >spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 5Gi > storageClassName: gp3-csi > volumeMode: Filesystem >EOF >``` > >**Configure storage to pvc** >``` >oc edit configs.imageregistry/cluster >``` >Update the following fields (fields not listed below should not change): >``` >.spec > defaultRoute: true > replicas: 1 > rolloutStrategy: RollingUpdate > storage: > managementState: Managed > pvc: > claim: image-registry-test-claim >``` > >**Check PVC status** >At this point the PVC should have status "Bound", which means the image registry pod has successfully mounted it. >``` >oc get pvc image-registry-test-claim >``` > >**Set the registry to unmanaged, and update PVC mount to readOnly** >``` >oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"managementState":"Unmanaged"}}' >``` >Edit the image registry deployment: >``` >oc edit deployment image-registry >``` >Find the image-registry-test-claim volume in `.spec.template.spec.volumes` and change it to: >``` > - name: registry-storage > persistentVolumeClaim: > claimName: image-registry-test-claim > readOnly: true >``` >Wait for a new pod, then check that the volume is mounted read-only: >``` >oc get pods -ojsonpath='{.spec.volumes[?(@.name=="registry-storage")]}' >``` >Sample output: >``` >{"name":"registry-storage","persistentVolumeClaim":{"claimName":"image-registry-test-claim","readOnly":true}} >``` > >**Import an image** >``` >oc import-image --reference-policy=local hello-world:latest --from=docker.io/library/hello-world:latest --confirm >``` > >**Pull the image** >``` >REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') >podman login --tls-verify=false $REGISTRY -u unused -p $(oc whoami -t) >podman pull --tls-verify=false ${REGISTRY}/openshift-image-registry/hello-world >``` > >**Verify the logs for a failure to mirror** >While the import and pull will work just fine, writing to local storage will fail. >This can be verified by grepping the image-registry logs: >``` >oc logs deploy/image-registry|grep 'Background mirroring' >``` > >This should output a log entry similar to the following: >``` >time="2024-09-18T13:16:49.077558148Z" level=error msg="Background mirroring failed: error committing to storage: filesystem: mkdir /registry/docker: read-only file system" go.version="go1.22.5 (Red Hat 1.22.5-1.el9) X:strictfipsruntime" http.request.host=default-route-openshift-image-registry.apps.ci-ln-m1stck2-76ef8.origin-ci-int-aws.dev.rhcloud.com http.request.id=06ee4c04-b721-4b7b-9977-e7e99ac3c847 http.request.method=GET http.request.remoteaddr=78.72.22.80 http.request.uri="/v2/openshift-image-registry/hello-world/blobs/sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" http.request.useragent="containers/5.31.1 (github.com/containers/image)" openshift.auth.user="kube:admin" vars.digest="sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" vars.name=openshift-image-registry/hello-world >``` > >**Verify metric in web console** >Open the openshift web console, choose Observe -> Metrics on the left-side menu, then filter the metrics by: >``` >imageregistry_storage_errors_total >``` >That should return at least one metric with code `READ_ONLY_FILESYSTEM`, similar to the below: >![image](https://github.com/user-attachments/assets/e1188ab3-38fe-4a77-a18b-dad804946a3d) > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fimage-registry). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 1 month ago

@flavianmissi: This pull request references IR-489 which is a valid jira issue.

In response to [this](https://github.com/openshift/image-registry/pull/412): >before this change, "read-only file system" errors would show up in the metrics as "UNKNOWN". > >this commit inspects the enclosed storage path error and reports a metric reporting the failed operation. > >--- > >## Testing this change > >The below steps have been tested against `4.18.0-0.test-2024-09-26-122820-ci-ln-0y1k6pk-latest` on AWS. > >### Set storage to read-only >Easiest way I could find to achieve this is to configure the registry to use a pvc, then edit the image-registry deployment to mount the pvc with `readOnly: true`. > >**Set project to openshift-image-registry** >``` >oc project openshift-image-registry >``` > >**Create the PVC** >``` >cat << EOF | kubectl apply -f - >apiVersion: v1 >kind: PersistentVolumeClaim >metadata: > name: image-registry-test-claim > namespace: openshift-image-registry >spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 5Gi > storageClassName: gp3-csi > volumeMode: Filesystem >EOF >``` > >**Configure storage to pvc** >``` >oc edit configs.imageregistry/cluster >``` >Update the following fields (fields not listed below should not change): >``` >.spec > defaultRoute: true > replicas: 1 > rolloutStrategy: RollingUpdate > storage: > managementState: Managed > pvc: > claim: image-registry-test-claim >``` > >**Check PVC status** >At this point the PVC should have status "Bound", which means the image registry pod has successfully mounted it. >``` >oc get pvc image-registry-test-claim >``` > >**Set the registry to unmanaged, and update PVC mount to readOnly** >``` >oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"managementState":"Unmanaged"}}' >``` >Edit the image registry deployment: >``` >oc edit deployment image-registry >``` >Find the image-registry-test-claim volume in `.spec.template.spec.volumes` and change it to: >``` > - name: registry-storage > persistentVolumeClaim: > claimName: image-registry-test-claim > readOnly: true >``` >Wait for a new pod, then check that the volume is mounted read-only: >``` >oc get pods -ojsonpath='{.spec.volumes[?(@.name=="registry-storage")]}' >``` >Sample output: >``` >{"name":"registry-storage","persistentVolumeClaim":{"claimName":"image-registry-test-claim","readOnly":true}} >``` > >**Import an image** >``` >oc import-image --reference-policy=local hello-world:latest --from=docker.io/library/hello-world:latest --confirm >``` > >**Pull the image** >``` >REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') >podman login --tls-verify=false $REGISTRY -u unused -p $(oc whoami -t) >podman pull --tls-verify=false ${REGISTRY}/openshift-image-registry/hello-world >``` > >**Verify the logs for a failure to mirror** >While the import and pull will work just fine, writing to local storage will fail. >This can be verified by grepping the image-registry logs: >``` >oc logs deploy/image-registry|grep 'Background mirroring' >``` > >This should output a log entry similar to the following: >``` >time="2024-09-18T13:16:49.077558148Z" level=error msg="Background mirroring failed: error committing to storage: filesystem: mkdir /registry/docker: read-only file system" go.version="go1.22.5 (Red Hat 1.22.5-1.el9) X:strictfipsruntime" http.request.host=default-route-openshift-image-registry.apps.ci-ln-m1stck2-76ef8.origin-ci-int-aws.dev.rhcloud.com http.request.id=06ee4c04-b721-4b7b-9977-e7e99ac3c847 http.request.method=GET http.request.remoteaddr=78.72.22.80 http.request.uri="/v2/openshift-image-registry/hello-world/blobs/sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" http.request.useragent="containers/5.31.1 (github.com/containers/image)" openshift.auth.user="kube:admin" vars.digest="sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" vars.name=openshift-image-registry/hello-world >``` > >**Verify metric in web console** >Open the openshift web console, choose Observe -> Metrics on the left-side menu, then filter the metrics by: >``` >imageregistry_storage_errors_total >``` >That should return at least one metric with code `READ_ONLY_FILESYSTEM`, similar to the below: >![image](https://github.com/user-attachments/assets/e1188ab3-38fe-4a77-a18b-dad804946a3d) > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fimage-registry). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
flavianmissi commented 1 month ago

/retitle IR-489: export storage read-only errors as metrics

flavianmissi commented 1 month ago

test failures are unrelated to changes /retest

ardaguclu commented 1 month ago

My comment above is just a suggestion, feel free to unhold when you are ready; /lgtm

openshift-ci[bot] commented 1 month ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ardaguclu, flavianmissi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openshift/image-registry/blob/master/OWNERS)~~ [flavianmissi] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
ardaguclu commented 1 month ago

/hold

flavianmissi commented 1 month ago

removing the hold, though this still needs a few other approvals (qe, docs, px) before merging. /hold cancel

flavianmissi commented 1 month ago

/test e2e-hypershift

openshift-ci[bot] commented 1 month ago

@flavianmissi: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 1 month ago

@flavianmissi: This pull request references IR-489 which is a valid jira issue.

In response to [this](https://github.com/openshift/image-registry/pull/412): >before this change, "read-only file system" errors would show up in the metrics as "UNKNOWN". > >this commit inspects the enclosed storage path error and reports a metric reporting the failed operation. > >--- > >## Testing this change > >The below steps have been tested against `4.18.0-0.test-2024-09-26-122820-ci-ln-0y1k6pk-latest` on AWS. > >### Set storage to read-only >Easiest way I could find to achieve this is to configure the registry to use a pvc, then edit the image-registry deployment to mount the pvc with `readOnly: true`. > >**Set project to openshift-image-registry** >``` >oc project openshift-image-registry >``` > >**Create the PVC** >``` >cat << EOF | kubectl apply -f - >apiVersion: v1 >kind: PersistentVolumeClaim >metadata: > name: image-registry-test-claim > namespace: openshift-image-registry >spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 5Gi > storageClassName: gp3-csi > volumeMode: Filesystem >EOF >``` > >**Configure storage to pvc** >``` >oc edit configs.imageregistry/cluster >``` >Update the following fields (fields not listed below should not change): >``` >.spec > defaultRoute: true > replicas: 1 > rolloutStrategy: Recreate > storage: > pvc: > claim: image-registry-test-claim >``` > >**Check PVC status** >At this point the PVC should have status "Bound", which means the image registry pod has successfully mounted it. >``` >oc get pvc image-registry-test-claim >``` > >**Set the registry to unmanaged, and update PVC mount to readOnly** >``` >oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"managementState":"Unmanaged"}}' >``` >Edit the image registry deployment: >``` >oc edit deployment image-registry >``` >Find the image-registry-test-claim volume in `.spec.template.spec.volumes` and change it to: >``` > - name: registry-storage > persistentVolumeClaim: > claimName: image-registry-test-claim > readOnly: true >``` >Wait for a new pod, then check that the volume is mounted read-only: >``` >oc get pods -ojsonpath='{.spec.volumes[?(@.name=="registry-storage")]}' >``` >Sample output: >``` >{"name":"registry-storage","persistentVolumeClaim":{"claimName":"image-registry-test-claim","readOnly":true}} >``` > >**Import an image** >``` >oc import-image --reference-policy=local hello-world:latest --from=docker.io/library/hello-world:latest --confirm >``` > >**Pull the image** >``` >REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') >podman login --tls-verify=false $REGISTRY -u unused -p $(oc whoami -t) >podman pull --tls-verify=false ${REGISTRY}/openshift-image-registry/hello-world >``` > >**Verify the logs for a failure to mirror** >While the import and pull will work just fine, writing to local storage will fail. >This can be verified by grepping the image-registry logs: >``` >oc logs deploy/image-registry|grep 'Background mirroring' >``` > >This should output a log entry similar to the following: >``` >time="2024-09-18T13:16:49.077558148Z" level=error msg="Background mirroring failed: error committing to storage: filesystem: mkdir /registry/docker: read-only file system" go.version="go1.22.5 (Red Hat 1.22.5-1.el9) X:strictfipsruntime" http.request.host=default-route-openshift-image-registry.apps.ci-ln-m1stck2-76ef8.origin-ci-int-aws.dev.rhcloud.com http.request.id=06ee4c04-b721-4b7b-9977-e7e99ac3c847 http.request.method=GET http.request.remoteaddr=78.72.22.80 http.request.uri="/v2/openshift-image-registry/hello-world/blobs/sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" http.request.useragent="containers/5.31.1 (github.com/containers/image)" openshift.auth.user="kube:admin" vars.digest="sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" vars.name=openshift-image-registry/hello-world >``` > >**Verify metric in web console** >Open the openshift web console, choose Observe -> Metrics on the left-side menu, then filter the metrics by: >``` >imageregistry_storage_errors_total >``` >That should return at least one metric with code `READ_ONLY_FILESYSTEM`, similar to the below: >![image](https://github.com/user-attachments/assets/e1188ab3-38fe-4a77-a18b-dad804946a3d) > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fimage-registry). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
openshift-ci-robot commented 1 month ago

@flavianmissi: This pull request references IR-489 which is a valid jira issue.

In response to [this](https://github.com/openshift/image-registry/pull/412): >before this change, "read-only file system" errors would show up in the metrics as "UNKNOWN". > >this commit inspects the enclosed storage path error and reports a metric reporting the failed operation. > >--- > >## Testing this change > >The below steps have been tested against `4.18.0-0.test-2024-09-26-122820-ci-ln-0y1k6pk-latest` on AWS. > >### Set storage to read-only >Easiest way I could find to achieve this is to configure the registry to use a pvc, then edit the image-registry deployment to mount the pvc with `readOnly: true`. > >**Set project to openshift-image-registry** >``` >oc project openshift-image-registry >``` > >**Create the PVC** >``` >cat << EOF | kubectl apply -f - >apiVersion: v1 >kind: PersistentVolumeClaim >metadata: > name: image-registry-test-claim > namespace: openshift-image-registry >spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 5Gi > storageClassName: gp3-csi > volumeMode: Filesystem >EOF >``` > >**Configure storage to pvc** >``` >oc edit configs.imageregistry/cluster >``` >Update the following fields (fields not listed below should not change): >``` >.spec > defaultRoute: true > replicas: 1 > rolloutStrategy: Recreate > storage: > pvc: > claim: image-registry-test-claim >``` > >**Check PVC status** >Wait until the PVC status changes to "Bound", which means the image registry pod has successfully mounted it. >``` >oc get pvc image-registry-test-claim -w >``` > >**Set the registry to unmanaged, and update PVC mount to readOnly** >``` >oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"managementState":"Unmanaged"}}' >``` >Edit the image registry deployment: >``` >oc edit deployment image-registry >``` >Find the image-registry-test-claim volume in `.spec.template.spec.volumes` and change it to: >``` > - name: registry-storage > persistentVolumeClaim: > claimName: image-registry-test-claim > readOnly: true >``` >Wait for a new pod, then check that the volume is mounted read-only: >``` >oc get pods -ojsonpath='{.spec.volumes[?(@.name=="registry-storage")]}' >``` >Sample output: >``` >{"name":"registry-storage","persistentVolumeClaim":{"claimName":"image-registry-test-claim","readOnly":true}} >``` > >**Import an image** >``` >oc import-image --reference-policy=local hello-world:latest --from=docker.io/library/hello-world:latest --confirm >``` > >**Pull the image** >``` >REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') >podman login --tls-verify=false $REGISTRY -u unused -p $(oc whoami -t) >podman pull --tls-verify=false ${REGISTRY}/openshift-image-registry/hello-world >``` > >**Verify the logs for a failure to mirror** >While the import and pull will work just fine, writing to local storage will fail. >This can be verified by grepping the image-registry logs: >``` >oc logs deploy/image-registry|grep 'Background mirroring' >``` > >This should output a log entry similar to the following: >``` >time="2024-09-18T13:16:49.077558148Z" level=error msg="Background mirroring failed: error committing to storage: filesystem: mkdir /registry/docker: read-only file system" go.version="go1.22.5 (Red Hat 1.22.5-1.el9) X:strictfipsruntime" http.request.host=default-route-openshift-image-registry.apps.ci-ln-m1stck2-76ef8.origin-ci-int-aws.dev.rhcloud.com http.request.id=06ee4c04-b721-4b7b-9977-e7e99ac3c847 http.request.method=GET http.request.remoteaddr=78.72.22.80 http.request.uri="/v2/openshift-image-registry/hello-world/blobs/sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" http.request.useragent="containers/5.31.1 (github.com/containers/image)" openshift.auth.user="kube:admin" vars.digest="sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" vars.name=openshift-image-registry/hello-world >``` > >**Verify metric in web console** >Open the openshift web console, choose Observe -> Metrics on the left-side menu, then filter the metrics by: >``` >imageregistry_storage_errors_total >``` >That should return at least one metric with code `READ_ONLY_FILESYSTEM`, similar to the below: >![image](https://github.com/user-attachments/assets/e1188ab3-38fe-4a77-a18b-dad804946a3d) > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fimage-registry). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
xiuwang commented 3 weeks ago

@flavianmissi I have test following your steps, I could see the Read_only_filesystem metrics

flavianmissi commented 3 weeks ago

Thank you @xiuwang! Could you give the qe-approved label if you feel ready?

xiuwang commented 3 weeks ago

/label qe-approved

openshift-ci-robot commented 3 weeks ago

@flavianmissi: This pull request references IR-489 which is a valid jira issue.

In response to [this](https://github.com/openshift/image-registry/pull/412): >before this change, "read-only file system" errors would show up in the metrics as "UNKNOWN". > >this commit inspects the enclosed storage path error and reports a metric reporting the failed operation. > >--- > >## Testing this change > >The below steps have been tested against `4.18.0-0.test-2024-09-26-122820-ci-ln-0y1k6pk-latest` on AWS. > >### Set storage to read-only >Easiest way I could find to achieve this is to configure the registry to use a pvc, then edit the image-registry deployment to mount the pvc with `readOnly: true`. > >**Set project to openshift-image-registry** >``` >oc project openshift-image-registry >``` > >**Create the PVC** >``` >cat << EOF | kubectl apply -f - >apiVersion: v1 >kind: PersistentVolumeClaim >metadata: > name: image-registry-test-claim > namespace: openshift-image-registry >spec: > accessModes: > - ReadWriteOnce > resources: > requests: > storage: 5Gi > storageClassName: gp3-csi > volumeMode: Filesystem >EOF >``` > >**Configure storage to pvc** >``` >oc edit configs.imageregistry/cluster >``` >Update the following fields (fields not listed below should not change): >``` >.spec > defaultRoute: true > replicas: 1 > rolloutStrategy: Recreate > storage: > pvc: > claim: image-registry-test-claim >``` > >**Check PVC status** >Wait until the PVC status changes to "Bound", which means the image registry pod has successfully mounted it. >``` >oc get pvc image-registry-test-claim -w >``` > >**Set the registry to unmanaged, and update PVC mount to readOnly** >``` >oc patch configs.imageregistry cluster --type=merge -p '{"spec":{"managementState":"Unmanaged"}}' >``` >Edit the image registry deployment: >``` >oc edit deployment image-registry >``` >Find the image-registry-test-claim volume in `.spec.template.spec.volumes` and change it to: >``` > - name: registry-storage > persistentVolumeClaim: > claimName: image-registry-test-claim > readOnly: true >``` >Wait for a new pod, then check that the volume is mounted read-only: >``` >oc get pods -ojsonpath='{.spec.volumes[?(@.name=="registry-storage")]}' >``` >Sample output: >``` >{"name":"registry-storage","persistentVolumeClaim":{"claimName":"image-registry-test-claim","readOnly":true}} >``` > >**Import an image** >``` >oc import-image --reference-policy=local hello-world:latest --from=docker.io/library/hello-world:latest --confirm >``` > >**Pull the image** >``` >REGISTRY=$(oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}') >podman login --tls-verify=false $REGISTRY -u unused -p $(oc whoami -t) >podman pull --tls-verify=false ${REGISTRY}/openshift-image-registry/hello-world >``` > >**Verify the logs for a failure to mirror** >While the import and pull will work just fine, writing to local storage will fail. >This can be verified by grepping the image-registry logs: >``` >oc logs deploy/image-registry|grep 'Background mirroring' >``` > >This should output a log entry similar to the following: >``` >time="2024-09-18T13:16:49.077558148Z" level=error msg="Background mirroring failed: error committing to storage: filesystem: mkdir /registry/docker: read-only file system" go.version="go1.22.5 (Red Hat 1.22.5-1.el9) X:strictfipsruntime" http.request.host=default-route-openshift-image-registry.apps.ci-ln-m1stck2-76ef8.origin-ci-int-aws.dev.rhcloud.com http.request.id=06ee4c04-b721-4b7b-9977-e7e99ac3c847 http.request.method=GET http.request.remoteaddr=78.72.22.80 http.request.uri="/v2/openshift-image-registry/hello-world/blobs/sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" http.request.useragent="containers/5.31.1 (github.com/containers/image)" openshift.auth.user="kube:admin" vars.digest="sha256:d2c94e258dcb3c5ac2798d32e1249e42ef01cba4841c2234249495f87264ac5a" vars.name=openshift-image-registry/hello-world >``` > >**Verify metric in web console** >Open the openshift web console, choose Observe -> Metrics on the left-side menu, then filter the metrics by: >``` >imageregistry_storage_errors_total >``` >That should return at least one metric with code `READ_ONLY_FILESYSTEM`, similar to the below: >![image](https://github.com/user-attachments/assets/e1188ab3-38fe-4a77-a18b-dad804946a3d) > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Fimage-registry). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
stevsmit commented 3 weeks ago

/label docs-approved

sferich888 commented 3 weeks ago

/label px-approved

openshift-bot commented 3 weeks ago

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-registry This PR has been included in build openshift-enterprise-registry-container-v4.18.0-202410181609.p0.g0d4541d.assembly.stream.el9. All builds following this will include this PR.