openstack-k8s-operators / edpm-ansible

External Dataplane Management Ansible Playbooks
https://openstack-k8s-operators.github.io/edpm-ansible/
Apache License 2.0
9 stars 65 forks source link

Use private variable for sshd_options #666

Closed bshephar closed 3 months ago

bshephar commented 3 months ago

Use private variable when rendering sshd server options. This avoids issues with clobbering user provided settings when we try to re-use the same variable name.

softwarefactory-project-zuul[bot] commented 3 months ago

Build failed (check pipeline). Post recheck (without leading slash) to rerun all jobs. Make sure the failure cause has been resolved before you rerun jobs.

https://review.rdoproject.org/zuul/buildset/8265e7bb2db548d5b166923b4decce69

:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 2h 29m 40s :x: podified-multinode-edpm-deployment-crc FAILURE in 2h 13m 52s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 43m 29s :heavy_check_mark: edpm-ansible-molecule-edpm_bootstrap SUCCESS in 6m 16s :heavy_check_mark: edpm-ansible-molecule-edpm_podman SUCCESS in 5m 11s :heavy_check_mark: edpm-ansible-molecule-edpm_module_load SUCCESS in 4m 39s :heavy_check_mark: edpm-ansible-molecule-edpm_kernel SUCCESS in 9m 25s :heavy_check_mark: edpm-ansible-molecule-edpm_libvirt SUCCESS in 8m 05s :heavy_check_mark: edpm-ansible-molecule-edpm_nova SUCCESS in 9m 09s :heavy_check_mark: edpm-ansible-molecule-edpm_frr SUCCESS in 8m 12s :heavy_check_mark: edpm-ansible-molecule-edpm_iscsid SUCCESS in 5m 10s :heavy_check_mark: edpm-ansible-molecule-edpm_ovn_bgp_agent SUCCESS in 9m 12s :heavy_check_mark: edpm-ansible-molecule-edpm_ovs SUCCESS in 6m 32s :heavy_check_mark: edpm-ansible-molecule-edpm_tripleo_cleanup SUCCESS in 4m 32s

openshift-ci[bot] commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bshephar, rebtoor, slagle

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/openstack-k8s-operators/edpm-ansible/blob/main/OWNERS)~~ [bshephar,rebtoor,slagle] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
bshephar commented 3 months ago

recheck

bshephar commented 3 months ago

Is this a consistent thing? Or a timing thing?

2024-06-03T07:35:17Z    ERROR   Reconciler error    {"controller": "openstackdataplanenodeset", "controllerGroup": "dataplane.openstack.org", "controllerKind": "OpenStackDataPlaneNodeSet", "OpenStackDataPlaneNodeSet": {"name":"openstack-edpm-ipam","namespace":"openstack"}, "namespace": "openstack", "name": "openstack-edpm-ipam", "reconcileID": "f6d35d44-c878-45d3-94a8-480872d7a7b2", "error": "Found multiple OpenStackVersions when at most 1 should exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:227

Rechecking to see if it's repeatable.

softwarefactory-project-zuul[bot] commented 3 months ago

Build failed (check pipeline). Post recheck (without leading slash) to rerun all jobs. Make sure the failure cause has been resolved before you rerun jobs.

https://review.rdoproject.org/zuul/buildset/536204f29f1848519ea2fb4dcf5ae246

:heavy_check_mark: openstack-k8s-operators-content-provider SUCCESS in 3h 32m 32s :x: podified-multinode-edpm-deployment-crc FAILURE in 2h 16m 10s :heavy_check_mark: cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 56m 28s :heavy_check_mark: edpm-ansible-molecule-edpm_bootstrap SUCCESS in 6m 00s :heavy_check_mark: edpm-ansible-molecule-edpm_podman SUCCESS in 5m 10s :heavy_check_mark: edpm-ansible-molecule-edpm_module_load SUCCESS in 4m 37s :heavy_check_mark: edpm-ansible-molecule-edpm_kernel SUCCESS in 9m 26s :heavy_check_mark: edpm-ansible-molecule-edpm_libvirt SUCCESS in 8m 05s :heavy_check_mark: edpm-ansible-molecule-edpm_nova SUCCESS in 8m 30s :heavy_check_mark: edpm-ansible-molecule-edpm_frr SUCCESS in 6m 23s :heavy_check_mark: edpm-ansible-molecule-edpm_iscsid SUCCESS in 4m 37s :heavy_check_mark: edpm-ansible-molecule-edpm_ovn_bgp_agent SUCCESS in 6m 45s :heavy_check_mark: edpm-ansible-molecule-edpm_ovs SUCCESS in 5m 01s :heavy_check_mark: edpm-ansible-molecule-edpm_tripleo_cleanup SUCCESS in 4m 01s

bshephar commented 3 months ago

Yeah, same issue:

2024-06-03T15:27:44Z    ERROR   Controllers.OpenStackDataPlaneNodeSet   Found multiple OpenStackVersions when at most 1 should exist    {"controller": "openstackdataplanenodeset", "controllerGroup": "dataplane.openstack.org", "controllerKind": "OpenStackDataPlaneNodeSet", "OpenStackDataPlaneNodeSet": {"name":"openstack-edpm-ipam","namespace":"openstack"}, "namespace": "openstack", "name": "openstack-edpm-ipam", "reconcileID": "77642441-16c9-468e-9971-63e83a330123", "error": "Found multiple OpenStackVersions when at most 1 should exist"}
github.com/openstack-k8s-operators/dataplane-operator/pkg/util.GetVersion
    /remote-source/pkg/util/version.go:44
github.com/openstack-k8s-operators/dataplane-operator/controllers.(*OpenStackDataPlaneNodeSetReconciler).Reconcile
    /remote-source/controllers/openstackdataplanenodeset_controller.go:358
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:227
2024-06-03T15:27:44Z    ERROR   Reconciler error    {"controller": "openstackdataplanenodeset", "controllerGroup": "dataplane.openstack.org", "controllerKind": "OpenStackDataPlaneNodeSet", "OpenStackDataPlaneNodeSet": {"name":"openstack-edpm-ipam","namespace":"openstack"}, "namespace": "openstack", "name": "openstack-edpm-ipam", "reconcileID": "77642441-16c9-468e-9971-63e83a330123", "error": "Found multiple OpenStackVersions when at most 1 should exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:227

It's unrelated to this change though, so we'll need to look into what's happening there. There should only be one OpenStackVersions resource. Currently not sure where this second one is coming from.

We have one called "controlplane":

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/controlplane.yaml" | yq .metadata.name
controlplane

And one called "openstack-galera-network-isolation":

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/openstack-galera-network-isolation.yaml" | yq .metadata.name
openstack-galera-network-isolation

The second one however, has ownerReferences so it seems like the one that should exist:

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/openstack-galera-network-isolation.yaml" | yq ".metadata | (.name,.ownerReferences)"

openstack-galera-network-isolation
- apiVersion: core.openstack.org/v1beta1
  blockOwnerDeletion: true
  controller: true
  kind: OpenStackControlPlane
  name: openstack-galera-network-isolation
  uid: bd24bd62-2d01-4d0a-a143-9e7581ca7e06

Whereas the first one has no ownerReferences. So I'm not sure what's creating it:

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/controlplane.yaml" | yq ".metadata | (.name,.ownerReferences)"
controlplane
null

So we need to identify whatever is creating this controlplane OpenStackVersions. I assume something in the ci-framework. Let's open a separate Jira if it's not already tracked by an existing one, we'll move any additional troubleshooting to that new issue.

bshephar commented 3 months ago

Yeah, same issue:

2024-06-03T15:27:44Z  ERROR   Controllers.OpenStackDataPlaneNodeSet   Found multiple OpenStackVersions when at most 1 should exist    {"controller": "openstackdataplanenodeset", "controllerGroup": "dataplane.openstack.org", "controllerKind": "OpenStackDataPlaneNodeSet", "OpenStackDataPlaneNodeSet": {"name":"openstack-edpm-ipam","namespace":"openstack"}, "namespace": "openstack", "name": "openstack-edpm-ipam", "reconcileID": "77642441-16c9-468e-9971-63e83a330123", "error": "Found multiple OpenStackVersions when at most 1 should exist"}
github.com/openstack-k8s-operators/dataplane-operator/pkg/util.GetVersion
  /remote-source/pkg/util/version.go:44
github.com/openstack-k8s-operators/dataplane-operator/controllers.(*OpenStackDataPlaneNodeSetReconciler).Reconcile
  /remote-source/controllers/openstackdataplanenodeset_controller.go:358
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
  /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
  /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
  /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
  /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:227
2024-06-03T15:27:44Z  ERROR   Reconciler error    {"controller": "openstackdataplanenodeset", "controllerGroup": "dataplane.openstack.org", "controllerKind": "OpenStackDataPlaneNodeSet", "OpenStackDataPlaneNodeSet": {"name":"openstack-edpm-ipam","namespace":"openstack"}, "namespace": "openstack", "name": "openstack-edpm-ipam", "reconcileID": "77642441-16c9-468e-9971-63e83a330123", "error": "Found multiple OpenStackVersions when at most 1 should exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
  /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
  /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
  /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.6/pkg/internal/controller/controller.go:227

It's unrelated to this change though, so we'll need to look into what's happening there. There should only be one OpenStackVersions resource. Currently not sure where this second one is coming from.

We have one called "controlplane":

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/controlplane.yaml" | yq .metadata.name
controlplane

And one called "openstack-galera-network-isolation":

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/openstack-galera-network-isolation.yaml" | yq .metadata.name
openstack-galera-network-isolation

The second one however, has ownerReferences so it seems like the one that should exist:

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/openstack-galera-network-isolation.yaml" | yq ".metadata | (.name,.ownerReferences)"

openstack-galera-network-isolation
- apiVersion: core.openstack.org/v1beta1
  blockOwnerDeletion: true
  controller: true
  kind: OpenStackControlPlane
  name: openstack-galera-network-isolation
  uid: bd24bd62-2d01-4d0a-a143-9e7581ca7e06

Whereas the first one has no ownerReferences. So I'm not sure what's creating it:

❯ curl -s "https://logserver.rdoproject.org/66/666/b941b4a13787996bc32abcb9929bdbc8e14a095c/github-check/podified-multinode-edpm-deployment-crc/036dfe0/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/crs/openstackversions.core.openstack.org/controlplane.yaml" | yq ".metadata | (.name,.ownerReferences)"
controlplane
null

So we need to identify whatever is creating this controlplane OpenStackVersions. I assume something in the ci-framework. Let's open a separate Jira if it's not already tracked by an existing one, we'll move any additional troubleshooting to that new issue.

We need this fix: https://github.com/openstack-k8s-operators/ci-framework/pull/1815

bshephar commented 3 months ago

recheck