Closed cjschaef closed 10 months ago
/retest
These test failures appear to be consistent across multiple components' CI tests, and aren't related to these PR changes. https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-api-provider-ibmcloud/27/pull-ci-openshift-machine-api-provider-ibmcloud-main-e2e-ibmcloud/1717541136786001920
This PR should be ready for review.
/retitle OCPCLOUD-2263: IBMCloud: Add boot volume key to config spec
@cjschaef: This pull request references OCPCLOUD-2263 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.
/retest-required
/retest
@cjschaef: This pull request references OCPCLOUD-2263 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.
/retitle OCPCLOUD-2264: IBMCloud: Add boot volume key support
@cjschaef: This pull request references OCPCLOUD-2264 which is a valid jira issue.
Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.
/hold
the changes look ok to me, @cjschaef what's going on with the failed ibm test?
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: elmiko
The full list of commands accepted by this bot can be found here.
The pull request process is described here
These results look similar to other test flakes I saw two weeks or so ago, so hopefully things are more stable now
the server is currently unable to handle the request
/retest e2e-ibmcloud
@cjschaef: The /retest
command does not accept any targets.
The following commands are available to trigger required jobs:
/test e2e-aws
/test generate
/test goimports
/test golint
/test govet
/test images
/test unit
The following commands are available to trigger optional jobs:
/test e2e-ibmcloud
Use /test all
to run all jobs.
/test e2e-ibmcloud
/test e2e-ibmcloud
Looks like there have been some leaked VPC resources, I'll have to investigate that further
Creating a new VPC will put the user over quota. Allocated: 20, Requested: 1, Quota: 20
/lgtm
I'll try to fix and get updated ASAP.
/test e2e-ibmcloud
I think we have another issue with the MAPI release version popping up in the e2e-ibmcloud
test preventing MAPI from running, MAPI logs:
panic: semver: Parse(0.0.0-0565398): Numeric PreRelease version must not contain leading zeroes "0565398"
@cjschaef i have a feeling that we are hitting a weird corner case in the build based on the use of git describe
in the makefile for the VERSION
variable. i created some cards to address this, see https://issues.redhat.com/browse/OCPCLOUD-2227 for more details.
@elmiko are we able to merge this PR in lieu of the issue you noted?
@jeffnowicki it depends if this issue is just a flake, but it should be fairly quick to fix the makefile.
Hopefully, https://github.com/openshift/machine-api-provider-ibmcloud/pull/29 will resolve the version panics. Will rebase after that merges.
e2e-ibmcloud
got past the version flake from prior, OCP Conformance results look unique, going to re-run since I am pretty sure they are unrelated to these MAPI changes.
/retest
/retest
While most look like flakes or commonly failing tests (same in other repos), compared with last run, I am a little less sure on events should not repeat pathologically for ns/openshift-oauth-apiserver
, going to retest and also compare results with e2e-ibmcloud
from another repo.
/retest
A test name overlap occurred level=error msg=Error: An A, AAAA, or CNAME record with that host already exists. For more details, refer to <https://developers.IBM.com/dns/manage-dns-records/troubleshooting/records-with-same-name/>
Going to try again /retest
wonder if this is another clash? i'm not familiar with these error messages.
Let me take a look at the CI account to see if I can find out more info.
Looks like DNS Records have leaked in the CI. I'll see about cleaning them up, will have to follow up to determine what is allowing that to happen.
I completed some cleanup of DNS Records from prior to today (will check tomorrow's), but I think a majority likely leaked due to a bug in my cleanup automation (used to cleanup CI failures during IPI deployments, likely from Infrastructure).
I have a fix to resolve that internally, but hopefully now the chances of a duplicate infraID will be low (around 25 records remaining from today).
/retest
Since I see the same initial failure
level=error msg=Error: An A, AAAA, or CNAME record with that host already exists. For more details, refer to <https://developers.ibm.com/dns/manage-dns-records/troubleshooting/records-with-same-name/>.
level=error
level=error msg= with module.cis.ibm_cis_dns_record.kubernetes_api_internal[0],
which is followed by a second attempt, which fails because the cleanup of the first likely didn't complete fast enough
level=error msg=Error: BucketAlreadyExists: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
level=error msg= status code: 409, request id: 26e62cbd-30fe-4ab5-a4c5-849b554e28d8, host id:
level=error
level=error msg= with module.image.ibm_cos_bucket.images,
These have nothing to do with MAPI. I may suspect that IBM Cloud CIS is having intermittent issues, as I do not see a DNS Record that exists related to this failure, and the artifacts appear to be for install attempt 2, so I don't have much more details on what error occurred. I can retrigger, hoping CIS works, as I don't see any notifications for CIS currently. Tomorrow I can try running some local testing to confirm if CIS is normal or not.
/retest
I think these latest results look more to what I'd expect, with the monitor/poller failures being common, and the other flakes (disruption tests) popping up on this round.
thanks for the confirmation @cjschaef , i think the latest results look more like a flake as well. i'm happy to label this, but i'd like to run the tests again to see if we can get a good result.
/lgtm /test e2e-ibmcloud
Is this (machine-controller logs) due to an issue with the image build? Something we may need to update in Dockerfile (base image)?
/machine-controller-manager: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /machine-controller-manager)
/machine-controller-manager: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /machine-controller-manager)
Rebasing, as changes were made to Dockerfile https://github.com/openshift/machine-api-provider-ibmcloud/commit/f7acd33fd76f28a6eadeac4025e8a1151036aa72
Same result, will have to wait for images to be fixed
/machine-controller-manager: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /machine-controller-manager)
/machine-controller-manager: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /machine-controller-manager)
/test e2e-ibmcloud
/retest
Still waiting on fix to base image
/machine-controller-manager: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /machine-controller-manager)
/machine-controller-manager: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /machine-controller-manager)
/retest
/retest
/retest
@jeffnowicki it seems like we are past the image failures, but it looks like there is some quota or permission problem on the ibm infra. is there anything to be concerned with about that?
Something may be up with IBM Cloud COS, going to check. Other failures in the build are because it is retrying to create the COS instance/bucket, but it already existing. So, sounds like something happened or is down with that service, or perhaps IAM, etc. https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-api-provider-ibmcloud/27/pull-ci-openshift-machine-api-provider-ibmcloud-main-e2e-ibmcloud/1731667523503394816
level=error msg=Error: AccessDenied: Access Denied
level=error msg= status code: 403, request id: ca5fcbba-27fe-4da6-885a-02cc01c6300a, host id:
level=error
level=error msg= with module.image.ibm_cos_bucket.images,
level=error msg= on image/main.tf line 10, in resource "ibm_cos_bucket" "images":
level=error msg= 10: resource "ibm_cos_bucket" "images" {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failure applying terraform for "network" stage: error applying Terraform configs: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: AccessDenied: Access Denied
level=error msg= status code: 403, request id: ca5fcbba-27fe-4da6-885a-02cc01c6300a, host id:
level=error
level=error msg= with module.image.ibm_cos_bucket.images,
level=error msg= on image/main.tf line 10, in resource "ibm_cos_bucket" "images":
level=error msg= 10: resource "ibm_cos_bucket" "images" {
Will see if it was a small blip or whether an outage we need to wait on.
Things appear to be ongoing, I will have to monitor and wait to see when they resolve.
Things may be better now, will retrigger /retest
storage 4.15.0-0.ci.test-2023-12-05-184424-ci-op-7zl3z8fc-latest False True False 64m IBMVPCBlockCSIDriverOperatorCRAvailable: IBMBlockDriverControllerServiceControllerAvailable: Waiting for Deployment...
openshift-cluster-csi-drivers ibm-vpc-block-csi-controller-79f7bc8c49-74chn 0/6 CrashLoopBackOff 73 (3m43s ago) 64m 10.128.2.7 ci-op-7zl3z8fc-6383f-ndcz7-worker-1-t8lnb <none> <none>
openshift-cluster-csi-drivers ibm-vpc-block-csi-driver-operator-87965ffc4-5gwvc 1/1 Running 1 (57m ago) 64m 10.129.0.8 ci-op-7zl3z8fc-6383f-ndcz7-master-1 <none> <none>
openshift-cluster-csi-drivers ibm-vpc-block-csi-node-6d6wz 0/3 CrashLoopBackOff 37 (2m8s ago) 52m 10.129.2.6 ci-op-7zl3z8fc-6383f-ndcz7-worker-2-txmtf <none> <none>
openshift-cluster-csi-drivers ibm-vpc-block-csi-node-nz72r 0/3 CrashLoopBackOff 37 (2m45s ago) 52m 10.128.2.4 ci-op-7zl3z8fc-6383f-ndcz7-worker-1-t8lnb <none> <none>
openshift-cluster-csi-drivers ibm-vpc-block-csi-node-rsvkp 0/3 CrashLoopBackOff 37 (2m25s ago) 52m 10.131.0.5 ci-op-7zl3z8fc-6383f-ndcz7-worker-3-54hgx <none> <none>
openshift-cluster-csi-drivers ibm-vpc-block-csi-node-t8hdj 0/3 CrashLoopBackOff 46 (74s ago) 64m 10.130.0.41 ci-op-7zl3z8fc-6383f-ndcz7-master-2 <none> <none>
openshift-cluster-csi-drivers ibm-vpc-block-csi-node-vh5pz 0/3 CrashLoopBackOff 46 (61s ago) 64m 10.128.0.13 ci-op-7zl3z8fc-6383f-ndcz7-master-0 <none> <none>
openshift-cluster-csi-drivers ibm-vpc-block-csi-node-vvphj 0/3 CrashLoopBackOff 44 (11s ago) 64m 10.129.0.13 ci-op-7zl3z8fc-6383f-ndcz7-master-1 <none> <none>
Looks like a storage container is hitting the image error (csi-driver container in ibm-vpc-block-csi-controller pod)
/bin/ibm-vpc-block-csi-driver: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /bin/ibm-vpc-block-csi-driver)
/bin/ibm-vpc-block-csi-driver: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /bin/ibm-vpc-block-csi-driver)
I can retry to see if the Storage image was a blip or will need attention too, not that I know what will be required at this time. /retest
Added a boot volume and encryption key field to the IBMCloudMachineProviderSpec, to allow machines to specify a boot volume encryption key. And added support to specify boot volume encryption key during machine creation.
Related: https://issues.redhat.com//browse/OCPCLOUD-2263 Related: https://issues.redhat.com//browse/OCPCLOUD-2264