rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
11.98k stars 2.64k forks source link

mon: fix mon scaledown when mons are portable #14106

Closed subhamkrai closed 3 weeks ago

subhamkrai commented 3 weeks ago

in case of portable mons, mon scaledown was skipped with below code

if mon.NodeName == "" {
            logger.Debugf("mon %q is not scheduled to a specific host", mon.DaemonName)
            continue
        }

which skips mon removal if mon nodeName is empty but if mons are scale down in case of portable mons in that case also we want to remove extra mons to match the cephCluster configuration.

Checklist:

subhamkrai commented 3 weeks ago

Testing result

  1. Pod count when mon count updated from 5 to 3

    srai@fedora ~ (fix-portable-mon-scaledwon) $ kc get pods
    NAME                                                              READY   STATUS      RESTARTS   AGE
    csi-cephfsplugin-62jtm                                            2/2     Running     0          5m7s
    csi-cephfsplugin-provisioner-6c7c779595-c78b7                     5/5     Running     0          5m7s
    csi-cephfsplugin-provisioner-6c7c779595-rmlfv                     5/5     Running     0          5m7s
    csi-cephfsplugin-v65td                                            2/2     Running     0          5m7s
    csi-cephfsplugin-z7ffw                                            2/2     Running     0          5m7s
    csi-rbdplugin-4d8dw                                               2/2     Running     0          5m7s
    csi-rbdplugin-gwxv2                                               2/2     Running     0          5m7s
    csi-rbdplugin-mckck                                               2/2     Running     0          5m7s
    csi-rbdplugin-provisioner-7f6c74b7cb-9h7tv                        5/5     Running     0          5m7s
    csi-rbdplugin-provisioner-7f6c74b7cb-9td4z                        5/5     Running     0          5m7s
    rook-ceph-crashcollector-ip-10-0-110-7.ec2.internal-d74d77mvwpd   1/1     Running     0          2m19s
    rook-ceph-crashcollector-ip-10-0-18-124.ec2.internal-757c4xq95q   1/1     Running     0          2m42s
    rook-ceph-crashcollector-ip-10-0-8-235.ec2.internal-587c88qq8tf   1/1     Running     0          2m41s
    rook-ceph-exporter-ip-10-0-110-7.ec2.internal-7f746cbd68-d4z5t    1/1     Running     0          2m19s
    rook-ceph-exporter-ip-10-0-18-124.ec2.internal-79555b4c57-xv8ck   1/1     Running     0          2m42s
    rook-ceph-exporter-ip-10-0-8-235.ec2.internal-694fbb45bc-p8l7v    1/1     Running     0          2m41s
    rook-ceph-mgr-a-5c44fbfff8-j82r4                                  3/3     Running     0          2m42s
    rook-ceph-mgr-b-8454db658c-49kwf                                  3/3     Running     0          2m41s
    rook-ceph-mon-a-5d6b754f97-xsfjq                                  2/2     Running     0          4m56s
    rook-ceph-mon-b-6c889c49fc-bsw8l                                  2/2     Running     0          4m21s
    rook-ceph-mon-c-699c849577-jrxq4                                  2/2     Running     0          3m49s
    rook-ceph-mon-d-7c77bd9f5f-7vfkt                                  2/2     Running     0          3m19s
    rook-ceph-mon-e-84f5bfdb57-7tlsf                                  2/2     Running     0          2m57s
    rook-ceph-operator-696bb979c9-cfpt8                               1/1     Running     0          9m43s
    rook-ceph-osd-0-5d494f5ff8-tg84r                                  2/2     Running     0          2m1s
    rook-ceph-osd-prepare-set1-data-0btc9f-xhg82                      0/1     Completed   0          2m19s
    ~/go/src/github.com/rook/deploy/examples
    srai@fedora ~ (fix-portable-mon-scaledwon) $ kwp
    ~/go/src/github.com/rook/deploy/examples
    srai@fedora ~ (fix-portable-mon-scaledwon) $ kc get pods
    NAME                                                              READY   STATUS      RESTARTS   AGE
    csi-cephfsplugin-62jtm                                            2/2     Running     0          9m16s
    csi-cephfsplugin-provisioner-6c7c779595-c78b7                     5/5     Running     0          9m16s
    csi-cephfsplugin-provisioner-6c7c779595-rmlfv                     5/5     Running     0          9m16s
    csi-cephfsplugin-v65td                                            2/2     Running     0          9m16s
    csi-cephfsplugin-z7ffw                                            2/2     Running     0          9m16s
    csi-rbdplugin-4d8dw                                               2/2     Running     0          9m16s
    csi-rbdplugin-gwxv2                                               2/2     Running     0          9m16s
    csi-rbdplugin-mckck                                               2/2     Running     0          9m16s
    csi-rbdplugin-provisioner-7f6c74b7cb-9h7tv                        5/5     Running     0          9m16s
    csi-rbdplugin-provisioner-7f6c74b7cb-9td4z                        5/5     Running     0          9m16s
    rook-ceph-crashcollector-ip-10-0-18-124.ec2.internal-757c4xq95q   1/1     Running     0          6m51s
    rook-ceph-crashcollector-ip-10-0-8-235.ec2.internal-587c88qq8tf   1/1     Running     0          6m50s
    rook-ceph-exporter-ip-10-0-18-124.ec2.internal-79555b4c57-xv8ck   1/1     Running     0          6m51s
    rook-ceph-exporter-ip-10-0-8-235.ec2.internal-694fbb45bc-p8l7v    1/1     Running     0          6m50s
    rook-ceph-mgr-a-5c44fbfff8-j82r4                                  3/3     Running     0          6m51s
    rook-ceph-mgr-b-8454db658c-49kwf                                  3/3     Running     0          6m50s
    rook-ceph-mon-b-6c889c49fc-bsw8l                                  2/2     Running     0          8m30s
    rook-ceph-mon-d-7c77bd9f5f-7vfkt                                  2/2     Running     0          7m28s
    rook-ceph-mon-e-84f5bfdb57-7tlsf                                  2/2     Running     0          7m6s
    rook-ceph-operator-696bb979c9-cfpt8                               1/1     Running     0          13m
    rook-ceph-osd-0-5d494f5ff8-tg84r                                  2/2     Running     0          6m10s
    rook-ceph-osd-prepare-set1-data-0btc9f-xhg82                      0/1     Completed   0          6m28s
    ~/go/src/github.com/rook/deploy/examples
  2. Log

    2024-04-24 06:47:14.232867 I | op-mon: removing an extra mon. currently 5 are in quorum and only 3 are desired
    2024-04-24 06:47:14.232913 I | op-mon: removing arbitrary extra mon "a"
    2024-04-24 06:47:14.232919 I | op-mon: ensuring removal of unhealthy monitor a
    ...
    ...
    ...
    2024-04-24 06:48:17.388591 I | op-mon: removing an extra mon. currently 4 are in quorum and only 3 are desired
    2024-04-24 06:48:17.388634 I | op-mon: removing arbitrary extra mon "c"
    2024-04-24 06:48:17.388639 I | op-mon: ensuring removal of unhealthy monitor c
    2024-04-24 06:48:17.779721 I | op-mon: removed monitor c
travisn commented 3 weeks ago

@subhamkrai Thanks for the full validation!

travisn commented 3 weeks ago

@subhamkrai For some reason the DCO action is not running, can you force push to retry?

subhamkrai commented 3 weeks ago

@subhamkrai For some reason the DCO action is not running, can you force push to retry?

done