kube-burner / kube-burner-ocp

OpenShift integrations and workloads for kube-burner
https://kube-burner.github.io/kube-burner-ocp/
Apache License 2.0
4 stars 14 forks source link

[BUG] cdv2 test failing to create build object on baremetal deployment #79

Open venkataanil opened 3 weeks ago

venkataanil commented 3 weeks ago

Bug Description

On OCP 4.14.19 barematal env, cluster-density-v2 is failing to create build object NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/cluster-density-v2-1 Docker Dockerfile New (InvalidOutputReference)

From our internal discussions 2 months ago, editing "configs.imageregistry.operator.openshift.io cluster" like below should fix the issue storage: emptyDir: {} managementState: Managed

However this didn't completely fix the issue.

We have seen 2 builds in failed state

[root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-1 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-1 Docker Dockerfile Complete 24 minutes ago 18s [root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-2 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-1 Docker Dockerfile New (InvalidOutputReference)
[root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-3 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-1 Docker Dockerfile New (InvalidOutputReference)
[root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-4 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-1 Docker Dockerfile Complete 24 minutes ago 17s [root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-0 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-1 Docker Dockerfile Complete 24 minutes ago 18s `

I tried a hack of adding Iteration number to the build name i.e

name: {{.JobName}}-{{.Iteration}}-{{.Replica}}` which worked for timebeing. I could see all build objects succesfully created.

[root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-1 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-1-1 Docker Dockerfile Complete 4 minutes ago 15s [root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-2 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-2-1 Docker Dockerfile Complete 4 minutes ago 15s [root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-3 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-3-1 Docker Dockerfile Complete 3 minutes ago 6s [root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-4 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-4-1 Docker Dockerfile Complete 38 seconds ago 7s [root@e16-h12-b01-fc640 ~]# oc get build -n cluster-density-v2-0 NAME TYPE FROM STATUS STARTED DURATION cluster-density-v2-0-1 Docker Dockerfile Complete 4 minutes ago 15s

This is a temporary hack we tried on the environment to unblock our team member. We should investigate why object creation failing for images in some namespaces.

We used kube-burner-ocp version V1.2.8 during this testing.

jtaleric commented 3 weeks ago

$ oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"managementState":"Managed"}}'

$ oc patch configs.imageregistry.operator.openshift.io cluster --type merge --patch '{"spec":{"storage":{"emptyDir":{}}}}'

should get the registry up and running, if not ping me on slack.

venkataanil commented 3 weeks ago

We tried that and we could see 2 builds succesful and 3 failed for 5 iterations run as explained in bug description.

jtaleric commented 3 weeks ago

We tried that and we could see 2 builds succesful and 3 failed for 5 iterations run as explained in bug description.

ah, i see it in the updated description now.

Looking more, do we have a must-gather or some logs?

jtaleric commented 3 weeks ago

We tried that and we could see 2 builds succesful and 3 failed for 5 iterations run as explained in bug description.

I also worry that we clearly state this image registry setup is for non-production testing. See - https://docs.openshift.com/container-platform/4.15/registry/configuring_registry_storage/configuring-registry-storage-baremetal.html#installation-registry-storage-non-production_configuring-registry-storage-baremetal

So, it could simply be due to us overwhelming the registry, but we should investigate more.

afcollins commented 3 weeks ago

I've ran into this before. A bare metal install doesn't create the registry by default. EmptyDir is sufficient, we only need A registry so that the test will run.

Assuming a registry is installed is outside the scope of kube-burner-ocp. Might we add to the documentation? Or expand the 'cluster health checks' to also check for the registry?

venkataanil commented 2 weeks ago

Both the documentation and 'cluster health checks' needed as the requirement is from cdv2 which is a main workload.

rsevilla87 commented 1 week ago

Hey @venkataanil!, registry availability should be verified if you pass the --check-health flag to the cluster-density-v2 workload