openstack-k8s-operators / openstack-operator

Meta Operator for OpenStack
https://openstack-k8s-operators.github.io/openstack-operator/
Apache License 2.0
27 stars 76 forks source link

Use non static name for the Cinder and Glance CRs #847

Closed Akrog closed 3 months ago

Akrog commented 3 months ago

When the openstack operator creates the cinder CR it always uses the "cinder" name and there is no way to change it.

This means that the names of the pods, and therefore the reported hostname inside them, will always be the same for the backup service (cinder-backup-0) and for the volume service in 2 deployments if they have the same key in the volumes map.

This may be fine for some deployments, but it creates problems in others, because many cinder drivers expect the hostnames of the hosts attaching volumes to be unique.

Because they expect them to be unique they use the hostname as a key to store information on the storage array such as the iSCSI initiator name or the HBAs addresses.

If they are not unique there can be unexpected behavior like hosts being unable to attach.

This is very common in CI systems, but is not limited to them as it can also happen in production deployments using the same storage arrays.

A similar thing would happen when Glance is using Cinder as a backend.

With this PR the cinder and glance sections have a new field called randomPodNames each so the openstack-operator can generate a non static name for the Cinder and Glance CRs.

This new name is the same as before but with a suffix of a dash (-) followed by the first 5 characters of the OpenStackControlPlane UID.

The is still a small chance to have duplicated pod names, but the chance should be small enough to be safe to ignore.

Related PRs:

Jira: OSPRH-7396

softwarefactory-project-zuul[bot] commented 3 months ago

Build failed (check pipeline). Post recheck (without leading slash) to rerun all jobs. Make sure the failure cause has been resolved before you rerun jobs.

https://review.rdoproject.org/zuul/buildset/47fb1586d07f4c87b0ec85e618efe4d9

:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 1h 47m 48s :x: podified-multinode-edpm-deployment-crc FAILURE in 1h 02m 48s :x: cifmw-crc-podified-edpm-baremetal FAILURE in 1h 12m 08s :x: openstack-operator-tempest-multinode FAILURE in 1h 05m 01s

Akrog commented 3 months ago

/retest

Akrog commented 3 months ago

/retest

jpodivin commented 3 months ago

Build Docs failed due to rate limit.

fmount commented 3 months ago

I'm ok with the current patch, and I don't have a strong option on using uniquePodNames instead of randomPodNames, so we can move forward with this unless there are strong concerns to address.

fmount commented 3 months ago

As discussed, let's have an additional iteration to move randomPodNames into uniquePodNames and we'll review again.

fmount commented 3 months ago

@stuggi @abays not sure you can give another pass to this and land to solve the nfs related issue. I think the main point was about the CRD parameter name and it has been added under Cinder and Glance templates. Just looking at kuttl we can see the old behavior is preserved [1], so I think it's good to go.

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openstack-k8s-operators_openstack-operator/847/pull-ci-openstack-k8s-operators-openstack-operator-main-openstack-operator-build-deploy-kuttl/1805679081946091520/artifacts/openstack-operator-build-deploy-kuttl/openstack-k8s-operators-gather/artifacts/must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-239f9b57d856c30ba157bcdfd75edb27270a8ba8613199defb1cfecf726f30c5/namespaces/openstack/crs/glances.glance.openstack.org/glance.yaml

stuggi commented 3 months ago

@stuggi @abays not sure you can give another pass to this and land to solve the nfs related issue. I think the main point was about the CRD parameter name and it has been added under Cinder and Glance templates. Just looking at kuttl we can see the old behavior is preserved [1], so I think it's good to go.

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openstack-k8s-operators_openstack-operator/847/pull-ci-openstack-k8s-operators-openstack-operator-main-openstack-operator-build-deploy-kuttl/1805679081946091520/artifacts/openstack-operator-build-deploy-kuttl/openstack-k8s-operators-gather/artifacts/must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-239f9b57d856c30ba157bcdfd75edb27270a8ba8613199defb1cfecf726f30c5/namespaces/openstack/crs/glances.glance.openstack.org/glance.yaml

I am ok to land this just wondering if it would be better to keep this in the component api as a parameter to have it transparent for the openstack-operator. like the openstack-operator always creates with the static name for a cinder or glance cr in the namespace, but the service operator could still use unique names for the specific deployments which need it (c-v, c-b,.). iiuc c-a and c-s won't need this.

fmount commented 3 months ago

@stuggi @abays not sure you can give another pass to this and land to solve the nfs related issue. I think the main point was about the CRD parameter name and it has been added under Cinder and Glance templates. Just looking at kuttl we can see the old behavior is preserved [1], so I think it's good to go. [1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openstack-k8s-operators_openstack-operator/847/pull-ci-openstack-k8s-operators-openstack-operator-main-openstack-operator-build-deploy-kuttl/1805679081946091520/artifacts/openstack-operator-build-deploy-kuttl/openstack-k8s-operators-gather/artifacts/must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-239f9b57d856c30ba157bcdfd75edb27270a8ba8613199defb1cfecf726f30c5/namespaces/openstack/crs/glances.glance.openstack.org/glance.yaml

I am ok to land this just wondering if it would be better to keep this in the component api as a parameter to have it transparent for the openstack-operator. like the openstack-operator always creates with the static name for a cinder or glance cr in the namespace, but the service operator could still use unique names for the specific deployments which need it (c-v, c-b,.). iiuc c-a and c-s won't need this.

I think the main goal is the opposite: the service operator can accept whatever is passed (if we build a CR with a name != glance or cinder, it will work), but considering the main CR is generated by the openstack-operator, this kind of features applies only when the deployment is coordinated from a CR where both glance and cinder are defined in this context. The service operator will work, but we can keep it generic enough to work with and without this logic. In addition, having this code living here makes any potential maintenance easy (because we can keep the behavior aligned for multiple operators from the same place). Not sure if you want to discuss and dig more into this problem, but if you think it's problematic we can discuss about changing it (after GA?)

stuggi commented 3 months ago

@stuggi @abays not sure you can give another pass to this and land to solve the nfs related issue. I think the main point was about the CRD parameter name and it has been added under Cinder and Glance templates. Just looking at kuttl we can see the old behavior is preserved [1], so I think it's good to go. [1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openstack-k8s-operators_openstack-operator/847/pull-ci-openstack-k8s-operators-openstack-operator-main-openstack-operator-build-deploy-kuttl/1805679081946091520/artifacts/openstack-operator-build-deploy-kuttl/openstack-k8s-operators-gather/artifacts/must-gather/quay-io-openstack-k8s-operators-openstack-must-gather-sha256-239f9b57d856c30ba157bcdfd75edb27270a8ba8613199defb1cfecf726f30c5/namespaces/openstack/crs/glances.glance.openstack.org/glance.yaml

I am ok to land this just wondering if it would be better to keep this in the component api as a parameter to have it transparent for the openstack-operator. like the openstack-operator always creates with the static name for a cinder or glance cr in the namespace, but the service operator could still use unique names for the specific deployments which need it (c-v, c-b,.). iiuc c-a and c-s won't need this.

I think the main goal is the opposite: the service operator can accept whatever is passed (if we build a CR with a name != glance or cinder, it will work), but considering the main CR is generated by the openstack-operator, this kind of features applies only when the deployment is coordinated from a CR where both glance and cinder are defined in this context. The service operator will work, but we can keep it generic enough to work with and without this logic. In addition, having this code living here makes any potential maintenance easy (because we can keep the behavior aligned for multiple operators from the same place). Not sure if you want to discuss and dig more into this problem, but if you think it's problematic we can discuss about changing it (after GA?)

thanks for the additional details, from my side we can keep it as is. Lets wait for @abays review and get it in

openshift-ci[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abays, Akrog, fmount

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openstack-k8s-operators/openstack-operator/blob/main/OWNERS)~~ [abays] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
abays commented 3 months ago
error: build error: Failed to push image: trying to reuse blob sha256:e394ea8406c7ed85ddc862f882ec77dcfd3863de3cb90fd8006238d521d22d44 at destination: pinging container registry quay.rdoproject.org: Get "https://quay.rdoproject.org/v2/": dial tcp 38.129.56.158:443: i/o timeout

/test openstack-operator-build-deploy-kuttl