openshift / svt

Apache License 2.0
123 stars 105 forks source link

automate for baremetal nfs provisioner #771

Closed qiliRedHat closed 11 months ago

qiliRedHat commented 11 months ago

https://issues.redhat.com/browse/OCPQE-17247

qiliRedHat commented 11 months ago

Tested for nfs-provider https://redhat-internal.slack.com/archives/C0266JJ4XM5/p1696937695939719 Tested for non nfs-provider https://redhat-internal.slack.com/archives/C0266JJ4XM5/p1697526172467559

qiliRedHat commented 11 months ago

@liqcui PTAL

liqcui commented 11 months ago

@qiliRedHat

The code looks good to me, but I check the test result of non nfs-provider and nfs-provider, there are 503 error and server unreachable issue, is that expect ?


[2023-10-14 10:07:09 UTC] [bc429089-f9e6-443b-966d-c352557caa7d] @Qiujie Li - qili , :boom: new_app: visit 'rails-pgsql-persistent-testuser-5-1.apps.qili-r05.qe-lrc.devcluster.openshift.com' failed after 35 minutes. status_code: 503


[2023-10-16 19:52:22 UTC] [6b27a912-6406-4ee6-bdf0-5c133cf56da6] @Qiujie Li - qili , :boom: cmd: oc --kubeconfig /root/reliability/reliability-v2-414-r02/kubeconfigs/kubeconfig_testuser-21 login -u testuser-21 -p xxxx failed. Result: error: dial tcp: lookup api.qili-r02.qe.devcluster.openshift.com: no such host - verify you have provided the correct host and port and that the server is currently running.

qiliRedHat commented 11 months ago

@liqcui Thanks for the review. For the nfs-provider failure, the new_app failures usually are caused by the application service are not accessable after a timeout of 35 minutes set by the reliability test. That could be caused by source to image build failure or other deployment failures. Node notReady can also cause this failure for a period until it is ready again. If the total failure rate at the end of the test is within a percentage compared to previous tests, it's ok. This pr is mainly about the initiate phase of the testing. Nothing is changed to impact the test running phase.

For the non nfs-provider, yesterday's test env was killed so there were login failures. I updated the link. New test has no issue.

liqcui commented 11 months ago

/lgtm