rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.04k stars 2.66k forks source link

switch host networks #12940

Closed wanghui-devops closed 5 months ago

wanghui-devops commented 8 months ago

rook-ceph version : v1.7 I'm trying to switch host networks on a single-node ceph cluster using multus, is there any guidance or documentation to help me do this? Or tell me that if I rebuild on the original cluster, the premise is that the data is still intact after reconstruction;

sp98 commented 8 months ago

Its possible to switch to host network on a running cluster. Ceph cluster CR needs to be update to add spec.Network.HostNetwork: true or spec.network.provider:host.

Once the above change is made, all the ceph daemons (except mons) will restart and use host network. For mons you need to fail them over manually. You can refer these steps for mon failover.

There is a PR in review to automate the mon failover when switching to host network. But that will be for 1.12. Don't think it will be backported to 1.7. So suggesting to use the latest rook ceph version.

wanghui-devops commented 8 months ago

In my test cluster, I configured one mons. When I execute the command kubectl scale deployment root-ceph-mon-a --replicas=0 -nrook-ceph and set timeout to 0, no mons are restored. Have you ever encountered this situation?

sp98 commented 8 months ago

can you share following details:

wanghui-devops commented 8 months ago

before modifying:

apiVersion: v1
items:
- apiVersion: ceph.rook.io/v1
  kind: CephCluster
  metadata:
    creationTimestamp: "2023-09-22T04:17:47Z"
    finalizers:
    - cephcluster.ceph.rook.io
    generation: 2
    managedFields:
    - apiVersion: ceph.rook.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:cephVersion:
            .: {}
            f:image: {}
          f:cleanupPolicy:
            .: {}
            f:sanitizeDisks:
              .: {}
              f:dataSource: {}
              f:iteration: {}
              f:method: {}
          f:crashCollector: {}
          f:dashboard:
            .: {}
            f:enabled: {}
            f:ssl: {}
          f:dataDirHostPath: {}
          f:disruptionManagement:
            .: {}
            f:machineDisruptionBudgetNamespace: {}
            f:managePodBudgets: {}
            f:osdMaintenanceTimeout: {}
          f:healthCheck:
            .: {}
            f:daemonHealth:
              .: {}
              f:mon:
                .: {}
                f:interval: {}
              f:osd: {}
              f:status: {}
            f:livenessProbe:
              .: {}
              f:mgr: {}
              f:mon: {}
              f:osd: {}
            f:startupProbe:
              .: {}
              f:mgr: {}
              f:mon: {}
              f:osd: {}
          f:mgr:
            .: {}
            f:count: {}
            f:modules: {}
          f:mon:
            .: {}
            f:count: {}
          f:monitoring: {}
          f:network:
            .: {}
            f:connections:
              .: {}
              f:compression: {}
              f:encryption: {}
            f:provider: {}
            f:selectors:
              .: {}
              f:cluster: {}
              f:public: {}
          f:placement:
            .: {}
            f:all:
              .: {}
              f:nodeAffinity:
                .: {}
                f:requiredDuringSchedulingIgnoredDuringExecution:
                  .: {}
                  f:nodeSelectorTerms: {}
            f:mgr:
              .: {}
              f:tolerations: {}
            f:mon:
              .: {}
              f:tolerations: {}
            f:osd:
              .: {}
              f:tolerations: {}
          f:priorityClassNames:
            .: {}
            f:mgr: {}
            f:mon: {}
            f:osd: {}
          f:removeOSDsIfOutAndSafeToRemove: {}
          f:storage:
            .: {}
            f:config:
              .: {}
              f:storeType: {}
            f:useAllDevices: {}
          f:waitTimeoutForHealthyOSDInMinutes: {}
      manager: kubectl-create
      operation: Update
      time: "2023-09-22T04:17:47Z"
    - apiVersion: ceph.rook.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers: {}
        f:spec:
          f:external: {}
          f:healthCheck:
            f:daemonHealth:
              f:osd:
                f:interval: {}
              f:status:
                f:interval: {}
          f:logCollector: {}
          f:security:
            .: {}
            f:kms: {}
          f:storage:
            f:nodes: {}
        f:status:
          .: {}
          f:ceph:
            .: {}
            f:capacity:
              .: {}
              f:bytesAvailable: {}
              f:bytesTotal: {}
              f:bytesUsed: {}
              f:lastUpdated: {}
            f:fsid: {}
            f:health: {}
            f:lastChanged: {}
            f:lastChecked: {}
            f:previousHealth: {}
            f:versions:
              .: {}
              f:mgr:
                .: {}
                f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
              f:mon:
                .: {}
                f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
              f:osd:
                .: {}
                f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
              f:overall:
                .: {}
                f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
          f:conditions: {}
          f:message: {}
          f:observedGeneration: {}
          f:phase: {}
          f:state: {}
          f:storage:
            .: {}
            f:deviceClasses: {}
          f:version:
            .: {}
            f:image: {}
            f:version: {}
      manager: rook
      operation: Update
      time: "2023-09-22T04:34:24Z"
    name: rook-ceph
    namespace: rook-ceph
    resourceVersion: "3916613"
    selfLink: /apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters/rook-ceph
    uid: 17dd4990-fb78-499f-a77c-7a577ac52376
  spec:
    cephVersion:
      image: quay.io/ceph/ceph:v17.2.0
    cleanupPolicy:
      sanitizeDisks:
        dataSource: zero
        iteration: 1
        method: quick
    crashCollector: {}
    dashboard:
      enabled: true
      ssl: true
    dataDirHostPath: /var/lib/rook
    disruptionManagement:
      machineDisruptionBudgetNamespace: openshift-machine-api
      managePodBudgets: true
      osdMaintenanceTimeout: 30
    external: {}
    healthCheck:
      daemonHealth:
        mon:
          interval: 45s
        osd:
          interval: 1m0s
        status:
          interval: 1m0s
      livenessProbe:
        mgr: {}
        mon: {}
        osd: {}
      startupProbe:
        mgr: {}
        mon: {}
        osd: {}
    logCollector: {}
    mgr:
      count: 1
      modules:
      - enabled: true
        name: pg_autoscaler
    mon:
      count: 1
    monitoring: {}
    network:
      connections:
        compression: {}
        encryption: {}
      provider: multus
      selectors:
        cluster: rook-ceph/rook-cluster-nad
        public: rook-ceph/rook-public-nad
    placement:
      all:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: role
                operator: In
                values:
                - storage-node
      mgr:
        tolerations:
        - key: storage-node
          operator: Exists
      mon:
        tolerations:
        - key: storage-node
          operator: Exists
      osd:
        tolerations:
        - key: storage-node
          operator: Exists
    priorityClassNames:
      mgr: system-cluster-critical
      mon: system-node-critical
      osd: system-node-critical
    removeOSDsIfOutAndSafeToRemove: true
    security:
      kms: {}
    storage:
      config:
        storeType: bluestore
      nodes:
      - config:
          metadataDevice: sda
        devices:
        - name: sdb
        - name: sdc
        - name: sdd
        - name: sde
        - name: sdf
        - name: sdg
        - name: sdh
        - name: sdi
        - name: sdj
        - name: sdk
        - name: sdm
        - name: sdl
        name: dell-34
        resources: {}
      useAllDevices: false
    waitTimeoutForHealthyOSDInMinutes: 10
  status:
    ceph:
      capacity:
        bytesAvailable: 192010751717376
        bytesTotal: 208011641389056
        bytesUsed: 16000889671680
        lastUpdated: "2023-09-22T04:35:24Z"
      fsid: 0fd74bc0-1e01-4c48-8400-fd05eda63089
      health: HEALTH_OK
      lastChanged: "2023-09-22T04:34:24Z"
      lastChecked: "2023-09-22T04:35:24Z"
      previousHealth: HEALTH_WARN
      versions:
        mgr:
          ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 1
        mon:
          ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 1
        osd:
          ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 12
        overall:
          ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 14
    conditions:
    - lastHeartbeatTime: "2023-09-22T04:35:25Z"
      lastTransitionTime: "2023-09-22T04:33:22Z"
      message: Cluster created successfully
      reason: ClusterCreated
      status: "True"
      type: Ready
    message: Cluster created successfully
    observedGeneration: 2
    phase: Ready
    state: Created
    storage:
      deviceClasses:
      - name: hdd
    version:
      image: quay.io/ceph/ceph:v17.2.0
      version: 17.2.0-0
kind: List
metadata:
  resourceVersion: ""
  selfLink: "

image image

after modifying :

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  creationTimestamp: "2023-09-22T04:17:47Z"
  finalizers:
  - cephcluster.ceph.rook.io
  generation: 3
  managedFields:
  - apiVersion: ceph.rook.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        .: {}
        f:cephVersion:
          .: {}
          f:image: {}
        f:cleanupPolicy:
          .: {}
          f:sanitizeDisks:
            .: {}
            f:dataSource: {}
            f:iteration: {}
            f:method: {}
        f:crashCollector: {}
        f:dashboard:
          .: {}
          f:enabled: {}
          f:ssl: {}
        f:dataDirHostPath: {}
        f:disruptionManagement:
          .: {}
          f:machineDisruptionBudgetNamespace: {}
          f:managePodBudgets: {}
          f:osdMaintenanceTimeout: {}
        f:healthCheck:
          .: {}
          f:daemonHealth:
            .: {}
            f:mon:
              .: {}
              f:interval: {}
            f:osd: {}
            f:status: {}
          f:livenessProbe:
            .: {}
            f:mgr: {}
            f:mon: {}
            f:osd: {}
          f:startupProbe:
            .: {}
            f:mgr: {}
            f:mon: {}
            f:osd: {}
        f:mgr:
          .: {}
          f:count: {}
          f:modules: {}
        f:mon:
          .: {}
          f:count: {}
        f:monitoring: {}
        f:network:
          .: {}
          f:connections:
            .: {}
            f:compression: {}
            f:encryption: {}
        f:placement:
          .: {}
          f:all:
            .: {}
            f:nodeAffinity:
              .: {}
              f:requiredDuringSchedulingIgnoredDuringExecution:
                .: {}
                f:nodeSelectorTerms: {}
          f:mgr:
            .: {}
            f:tolerations: {}
          f:mon:
            .: {}
            f:tolerations: {}
          f:osd:
            .: {}
            f:tolerations: {}
        f:priorityClassNames:
          .: {}
          f:mgr: {}
          f:mon: {}
          f:osd: {}
        f:removeOSDsIfOutAndSafeToRemove: {}
        f:storage:
          .: {}
          f:config:
            .: {}
            f:storeType: {}
          f:useAllDevices: {}
        f:waitTimeoutForHealthyOSDInMinutes: {}
    manager: kubectl-create
    operation: Update
    time: "2023-09-22T04:17:47Z"
  - apiVersion: ceph.rook.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:finalizers: {}
      f:spec:
        f:external: {}
        f:healthCheck:
          f:daemonHealth:
            f:osd:
              f:interval: {}
            f:status:
              f:interval: {}
        f:logCollector: {}
        f:security:
          .: {}
          f:kms: {}
        f:storage:
          f:nodes: {}
      f:status:
        .: {}
        f:ceph:
          .: {}
          f:capacity:
            .: {}
            f:bytesAvailable: {}
            f:bytesTotal: {}
            f:bytesUsed: {}
            f:lastUpdated: {}
          f:fsid: {}
          f:health: {}
          f:lastChanged: {}
          f:lastChecked: {}
          f:previousHealth: {}
          f:versions:
            .: {}
            f:mgr:
              .: {}
              f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
            f:mon:
              .: {}
              f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
            f:osd:
              .: {}
              f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
            f:overall:
              .: {}
              f:ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): {}
        f:conditions: {}
        f:message: {}
        f:observedGeneration: {}
        f:phase: {}
        f:state: {}
        f:storage:
          .: {}
          f:deviceClasses: {}
        f:version:
          .: {}
          f:image: {}
          f:version: {}
    manager: rook
    operation: Update
    time: "2023-09-22T04:34:24Z"
  - apiVersion: ceph.rook.io/v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:healthCheck:
          f:daemonHealth:
            f:mon:
              f:timeout: {}
        f:network:
          f:provider: {}
    manager: kubectl-edit
    operation: Update
    time: "2023-09-22T04:39:59Z"
  name: rook-ceph
  namespace: rook-ceph
  resourceVersion: "3917817"
  selfLink: /apis/ceph.rook.io/v1/namespaces/rook-ceph/cephclusters/rook-ceph
  uid: 17dd4990-fb78-499f-a77c-7a577ac52376
spec:
  cephVersion:
    image: quay.io/ceph/ceph:v17.2.0
  cleanupPolicy:
    sanitizeDisks:
      dataSource: zero
      iteration: 1
      method: quick
  crashCollector: {}
  dashboard:
    enabled: true
    ssl: true
  dataDirHostPath: /var/lib/rook
  disruptionManagement:
    machineDisruptionBudgetNamespace: openshift-machine-api
    managePodBudgets: true
    osdMaintenanceTimeout: 30
  external: {}
  healthCheck:
    daemonHealth:
      mon:
        interval: 45s
        timeout: 0s
      osd:
        interval: 1m0s
      status:
        interval: 1m0s
    livenessProbe:
      mgr: {}
      mon: {}
      osd: {}
    startupProbe:
      mgr: {}
      mon: {}
      osd: {}
  logCollector: {}
  mgr:
    count: 1
    modules:
    - enabled: true
      name: pg_autoscaler
  mon:
    count: 1
  monitoring: {}
  network:
    connections:
      compression: {}
      encryption: {}
    provider: host
  placement:
    all:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: role
              operator: In
              values:
              - storage-node
    mgr:
      tolerations:
      - key: storage-node
        operator: Exists
    mon:
      tolerations:
      - key: storage-node
        operator: Exists
    osd:
      tolerations:
      - key: storage-node
        operator: Exists
  priorityClassNames:
    mgr: system-cluster-critical
    mon: system-node-critical
    osd: system-node-critical
  removeOSDsIfOutAndSafeToRemove: true
  security:
    kms: {}
  storage:
    config:
      storeType: bluestore
    nodes:
    - config:
        metadataDevice: sda
      devices:
      - name: sdb
      - name: sdc
      - name: sdd
      - name: sde
      - name: sdf
      - name: sdg
      - name: sdh
      - name: sdi
      - name: sdj
      - name: sdk
      - name: sdm
      - name: sdl
      name: dell-34
      resources: {}
    useAllDevices: false
  waitTimeoutForHealthyOSDInMinutes: 10
status:
  ceph:
    capacity:
      bytesAvailable: 192010751717376
      bytesTotal: 208011641389056
      bytesUsed: 16000889671680
      lastUpdated: "2023-09-22T04:40:00Z"
    fsid: 0fd74bc0-1e01-4c48-8400-fd05eda63089
    health: HEALTH_OK
    lastChanged: "2023-09-22T04:34:24Z"
    lastChecked: "2023-09-22T04:40:00Z"
    previousHealth: HEALTH_WARN
    versions:
      mgr:
        ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 1
      mon:
        ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 1
      osd:
        ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 12
      overall:
        ceph version 17.2.0 (43e2e60a7559d3f46c9d53f1ca875fd499a1e35e) quincy (stable): 14
  conditions:
  - lastHeartbeatTime: "2023-09-22T04:40:00Z"
    lastTransitionTime: "2023-09-22T04:33:22Z"
    message: Cluster created successfully
    reason: ClusterCreated
    status: "True"
    type: Ready
  - lastHeartbeatTime: "2023-09-22T04:40:04Z"
    lastTransitionTime: "2023-09-22T04:40:04Z"
    message: Configuring Ceph Mons
    reason: ClusterProgressing
    status: "True"
    type: Progressing
  message: Configuring Ceph Mons
  observedGeneration: 2
  phase: Progressing
  state: Creating
  storage:
    deviceClasses:
    - name: hdd
  version:
    image: quay.io/ceph/ceph:v17.2.0
    version: 17.2.0-0

exec command : kubectl scale -n rook-ceph deployment --replicas=0 rook-ceph-mon-a image

sp98 commented 8 months ago

Can you also share the rook-ceph-operator-* pod logs after modifying.

wanghui-devops commented 8 months ago

image

wanghui-devops commented 8 months ago

Is it because of single node single mons?

sp98 commented 8 months ago

most likely. Can you try with 3 mons on a single node?

wanghui-devops commented 8 months ago

I tried to start 3 mons on a single node, but even with `allowMultiplePerNode: True, it doesn't work. logs:

 rook-ceph-cluster-controller  failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed to create cluster: failed to start ceph monitors: refusing to deploy 3 monitors on the same host with host networking and allowMultiplePerNode is true. only one monitor per node is allowed 
sp98 commented 8 months ago

Oh right. My bad. That won't work. With host networking the mons will get the IP of the node, so we can't have more than one mon on a node.

wanghui-devops commented 8 months ago

Do you know how to rebuild a cluster? Can I rebuild a cluster with my old osd so that the data stays the same

sp98 commented 8 months ago

You might want to check the rook disaster recovery guide to see what's the best option available for your cluster.

wanghui-devops commented 8 months ago

I deployed a 3-node environment, but still no mon-d is created automatically, and the interval is still 10m, I configured it in healthCheck.daemonHealth.mon.timeout

sp98 commented 8 months ago

Strange. After kubectl scale deployment rook-ceph-mon-a --replicas=0 -nrook-ceph, the operator should failover the mon-a to mon-d after 10 minutes. Can you share the the rook operator logs after you scale down a mon-a.

BlaineEXE commented 8 months ago

In my test cluster, I configured one mons. When I execute the command kubectl scale deployment root-ceph-mon-a --replicas=0 -nrook-ceph and set timeout to 0, no mons are restored. Have you ever encountered this situation?

In a single-mon cluster, I don't think restoration is possible. At least 51% of mons must be available to restore a cluster.

I tried to start 3 mons on a single node, but even with `allowMultiplePerNode: True, it doesn't work. logs:

The ability to change from non-host to host networking was added in Rook v1.10.5.

Do you know how to rebuild a cluster? Can I rebuild a cluster with my old osd so that the data stays the same

Rebuilding a cluster is possible but risky, especially having only a single monitor. It would be safest to upgrade from v1.7->1.8->1.9->1.10 at minimum. That would allow Rook to change the networking internally. I would recommend upgrading to at least 1.11 which is currently the lowest version number currently under active upstream support.

Ref: https://github.com/rook/rook/pull/11211

wanghui-devops commented 8 months ago

I upgraded the 3-node cluster to v1.11.11 , then switched the network to “host”, mgr and osd automatically completed the update, and mons I followed the steps to manually “Failing over a Monitor”, but the results did not work as expected, and did not start a new mon , operator log: image

ceph status : image

sp98 commented 8 months ago

@wanghui-devops Can you share the following details

wanghui-devops commented 8 months ago

@sp98 my steps:

  1. exec kubectl edit -n rook-ceph cephclusters.ceph.rook.io rook-ceph to change network.provider;
  2. Waiting for the cluster to stabilize. (plugin -> mgr -> osd ; switch host network )
  3. exec kubectl -n rook-ceph scale deployment --replicas=0 rook-ceph-mon-a to waiting to start a new mon ...
  4. this is Complete rook-ceph-operator logs; operator.log and ceph status:

    cluster:
    id:     d8233e3f-24d6-46c5-9490-36aa6b73d896
    health: HEALTH_WARN
            1/3 mons down, quorum f,g
            Reduced data availability: 14 pgs inactive
    
    services:
    mon: 3 daemons, quorum f,g (age 17m), out of quorum: a
    mgr: a(active, since 49m), standbys: b
    osd: 3 osds: 3 up (since 81m), 3 in (since 26h)
    
    data:
    pools:   2 pools, 33 pgs
    objects: 11 objects, 4.2 MiB
    usage:   15 TiB used, 15 TiB / 29 TiB avail
    pgs:     42.424% pgs unknown
             19 active+clean
             14 unknown
sp98 commented 8 months ago

@sp98 my steps:

  1. exec kubectl edit -n rook-ceph cephclusters.ceph.rook.io rook-ceph to change network.provider;

Looks like the cluster was not in a good shape even before this step was performed. Can you confirm if you had 14 pgs inactive even before you had updated the host network?

@wanghui-devops

The cluster seems to be stuck at:

2023-09-27 07:02:18.656149 I | op-osd: OSD 1 is not ok-to-stop. will try updating it again later`

So two possibilities here:

  1. Either the cluster was not healthy even before the host network was added. or
  2. Cluster became unhealthy (14 pgs inactive ) after the host network was added.

Either case, the cluster is stuck because its not able to stop an OSD. And because of this is not failing over the mon-a.

wanghui-devops commented 8 months ago

I'm sure it's the second case, because I checked the cluster state before changing the network. Is this case due to the existence of data in the pg before the change? What should I do when that happens?

sp98 commented 8 months ago

@wanghui-devops can you share the complete rook-ceph operator logs in a text file. I need to check the complete logs from the beginning when the operator was first created.

wanghui-devops commented 8 months ago

@sp98 My complete steps: My complete steps:

  1. Cluster deployment: deployment completed on 2023-09-28 01:21:29.339839
  2. Upgrade Cluster (v1.9.0 -> v1.10.13): 2023-09-28 01:38:44.213941 Upgrade completed
  3. Upgrade Cluster (v1.10.13 -> v1.11.11) : 2023-09-28 01:54:50.014223 Upgrade completed
  4. Create a test pool and some data; Check ceph status: HEALTH_OK
  5. change the network
  6. After a while,"op-osd: OSD is not ok-to-stop ", detect ceph:"14 pgs inactive" ; Then execute: kubectl -n rook-ceph scale deployment --replicas=0 rook-ceph-mon-a ; Complete log: operator.txt
sp98 commented 8 months ago

thanks @wanghui-devops Looks like ceph is complaining that the osd is not ok-to-stop after updating CR to use host network. Is this a test cluster? If yes, can you update the cephCluster CR to add skipUpgradeChecks: true. This will skip any upgrade checks for now. After that you should see the mon-a failing over and a new mon being created with host network. Logs might look something like:


2023-09-28 07:09:25.825051 W | cephclient: skipping adding mon "a" to config file, detected out of quorum
2023-09-28 07:09:25.829594 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2023-09-28 07:09:25.830345 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2023-09-28 07:09:25.852498 W | op-mon: mon "a" not found in quorum, waiting for timeout (599 seconds left) before failover
2023-09-28 07:10:11.371954 W | op-mon: mon "a" not found in quorum, waiting for timeout (554 seconds left) before failover```
wanghui-devops commented 8 months ago

@sp98 Do I have to execute orders manually with kubectl -n rook-ceph scale deployment --replicas=0 rook-ceph-mon-a? now ,The operator logs :

2023-09-28 08:54:47.751942 W | op-mon: mon "a" NOT found in quorum and health timeout is 0, mon will never fail over
2023-09-28 08:54:47.751973 W | op-mon: monitor failover is disabled
2023-09-28 08:55:33.104082 W | op-mon: mon "a" NOT found in quorum and health timeout is 0, mon will never fail over
2023-09-28 08:55:33.104103 W | op-mon: monitor failover is disabled
2023-09-28 08:56:18.466629 W | op-mon: mon "a" NOT found in quorum and health timeout is 0, mon will never fail over
2023-09-28 08:56:18.466676 W | op-mon: monitor failover is disabled
2023-09-28 08:57:03.808799 W | op-mon: mon "a" NOT found in quorum and health timeout is 0, mon will never fail over
2023-09-28 08:57:03.808820 W | op-mon: monitor failover is disabled
2023-09-28 08:57:49.148884 W | op-mon: mon "a" NOT found in quorum and health timeout is 0, mon will never fail over
2023-09-28 08:57:49.148904 W | op-mon: monitor failover is disabled
sp98 commented 8 months ago

@wanghui-devops you should not change the mon healthmonitor timeout to 0s. Timeout of 0 means we don't want mons to failover. Default value is 10 minutes. You can reduce that value to less than 10 minutes, say 1m, but don't change it to 0s.

Basically you need to revert this change that you made:

                Disabled: false,
                Interval: &{Duration: s"45s"},
-               Timeout:  "",
+               Timeout:  "0s",

If you keep the timeout to empty, then it will take 10 minutes for mons to failover. Zero seconds means no failover.

wanghui-devops commented 8 months ago

@sp98 The good news is that the network switch is successful and the mons failover is complete. The bad news is that my test pv cannot be mounted. Do you need to reboot host?

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               6m57s                default-scheduler        Successfully assigned default/busybox-rbd-pool-1 to dell-34
  Normal   SuccessfulAttachVolume  6m57s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-3dcb9f4d-51d8-4076-a474-72260ef9a71d"
  Warning  FailedMount             4m55s                kubelet                  MountVolume.MountDevice failed for volume "pvc-3dcb9f4d-51d8-4076-a474-72260ef9a71d" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             45s (x9 over 4m55s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-3dcb9f4d-51d8-4076-a474-72260ef9a71d" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-62c32932-33e0-405b-90c1-056a4cb06d67 already exists
  Warning  FailedMount             20s (x3 over 4m54s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[my-pv-volume], unattached volumes=[my-pv-volume default-token-9vnkn]: timed out waiting for the condition
sp98 commented 8 months ago

can you share kubectl get pods -n rook-ceph -o wide output?

wanghui-devops commented 8 months ago
root@dell-34:~# kubectl get pod -n rook-ceph -o wide
NAME                                                READY   STATUS      RESTARTS   AGE   IP              NODE      NOMINATED NODE   READINESS GATES
csi-cephfsplugin-9k4d5                              2/2     Running     0          80m   192.168.60.42   dell-42   <none>           <none>
csi-cephfsplugin-ctckr                              2/2     Running     0          79m   192.168.60.41   dell-41   <none>           <none>
csi-cephfsplugin-holder-rook-ceph-6pxjp             1/1     Running     0          8h    10.244.0.78     dell-34   <none>           <none>
csi-cephfsplugin-holder-rook-ceph-lwgfv             1/1     Running     0          8h    10.244.2.132    dell-42   <none>           <none>
csi-cephfsplugin-holder-rook-ceph-ttwxg             1/1     Running     0          8h    10.244.1.113    dell-41   <none>           <none>
csi-cephfsplugin-lkcp7                              2/2     Running     0          79m   192.168.60.34   dell-34   <none>           <none>
csi-cephfsplugin-provisioner-58948fc785-64nrw       5/5     Running     0          80m   10.244.1.175    dell-41   <none>           <none>
csi-cephfsplugin-provisioner-58948fc785-cnx5m       5/5     Running     0          80m   10.244.0.254    dell-34   <none>           <none>
csi-rbdplugin-92rvv                                 2/2     Running     0          79m   192.168.60.34   dell-34   <none>           <none>
csi-rbdplugin-holder-rook-ceph-5stln                1/1     Running     0          8h    10.244.1.114    dell-41   <none>           <none>
csi-rbdplugin-holder-rook-ceph-6hbr4                1/1     Running     0          8h    10.244.0.77     dell-34   <none>           <none>
csi-rbdplugin-holder-rook-ceph-h9n2t                1/1     Running     0          8h    10.244.2.133    dell-42   <none>           <none>
csi-rbdplugin-kfbbm                                 2/2     Running     0          79m   192.168.60.41   dell-41   <none>           <none>
csi-rbdplugin-provisioner-5486f64f-n29bw            5/5     Running     0          80m   10.244.0.253    dell-34   <none>           <none>
csi-rbdplugin-provisioner-5486f64f-vhnrb            5/5     Running     0          80m   10.244.1.174    dell-41   <none>           <none>
csi-rbdplugin-rwzsp                                 2/2     Running     0          80m   192.168.60.42   dell-42   <none>           <none>
rook-ceph-crashcollector-dell-34-dcb8f9b64-gfws7    1/1     Running     0          80m   192.168.60.34   dell-34   <none>           <none>
rook-ceph-crashcollector-dell-41-89858dfd4-4j6vw    1/1     Running     0          80m   192.168.60.41   dell-41   <none>           <none>
rook-ceph-crashcollector-dell-42-5859d9f6df-l4sfq   1/1     Running     0          80m   192.168.60.42   dell-42   <none>           <none>
rook-ceph-mgr-a-548cd8479b-dtmhp                    2/2     Running     1          74m   192.168.60.34   dell-34   <none>           <none>
rook-ceph-mgr-b-8c6b55bdd-6kpsf                     2/2     Running     1          72m   192.168.60.41   dell-41   <none>           <none>
rook-ceph-mon-g-6ff656cbf4-lhc52                    1/1     Running     0          42m   192.168.60.34   dell-34   <none>           <none>
rook-ceph-mon-h-568f9b87f7-s6ff6                    1/1     Running     0          37m   192.168.60.42   dell-42   <none>           <none>
rook-ceph-mon-i-bf66847bb-krqrl                     1/1     Running     0          26m   192.168.60.41   dell-41   <none>           <none>
rook-ceph-operator-647948646c-892gk                 1/1     Running     0          8h    10.244.0.124    dell-34   <none>           <none>
rook-ceph-osd-0-656fcc879b-h5q9b                    1/1     Running     0          71m   192.168.60.41   dell-41   <none>           <none>
rook-ceph-osd-1-b8888465f-lvsxh                     1/1     Running     0          69m   192.168.60.34   dell-34   <none>           <none>
rook-ceph-osd-2-786d49fc44-fcqf4                    1/1     Running     0          68m   192.168.60.42   dell-42   <none>           <none>
rook-ceph-osd-prepare-dell-34-gj46v                 0/1     Completed   0          25m   192.168.60.34   dell-34   <none>           <none>
rook-ceph-osd-prepare-dell-41-nc7k6                 0/1     Completed   0          25m   192.168.60.41   dell-41   <none>           <none>
rook-ceph-osd-prepare-dell-42-9pgsx                 0/1     Completed   0          24m   192.168.60.42   dell-42   <none>           <none>
rook-ceph-tools-555c879675-pk597                    1/1     Running     0          8h    10.244.1.112    dell-41   <none>           <none>
wanghui-devops commented 8 months ago

ceph status :

root@dell-34:~# kubectl exec -it -n rook-ceph rook-ceph-tools-555c879675-pk597 -- ceph -s
  cluster:
    id:     fd15bc40-b2ff-446d-a033-cf28ee416873
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum g,h,i (age 29m)
    mgr: a(active, since 28m), standbys: b
    osd: 3 osds: 3 up (since 70m), 3 in (since 8h)

  data:
    pools:   4 pools, 97 pgs
    objects: 10.42k objects, 36 GiB
    usage:   29 TiB used, 29 TiB / 58 TiB avail
    pgs:     97 active+clean
sp98 commented 8 months ago

@wanghui-devops you can try rebooting the nodes. That should fix the pv mount issue.

wanghui-devops commented 8 months ago

Velero backup task is in progress. I can't restart the node yet. I'll try it after backup is completed.

sp98 commented 8 months ago

@wanghui-devops were you able to get this working?

wanghui-devops commented 8 months ago

@sp98 After the reboot, it's still the same error.

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               9m25s                default-scheduler        Successfully assigned default/busybox-rbd-pool-1 to dell-34
  Normal   SuccessfulAttachVolume  9m25s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-2150e55e-4b06-4e96-aba8-0ebad8d8726c"
  Warning  FailedMount             7m9s                 kubelet                  MountVolume.MountDevice failed for volume "pvc-2150e55e-4b06-4e96-aba8-0ebad8d8726c" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             5m8s                 kubelet                  Unable to attach or mount volumes: unmounted volumes=[my-pv-volume], unattached volumes=[default-token-9vnkn my-pv-volume]: timed out waiting for the condition
  Warning  FailedMount             57s (x10 over 7m9s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-2150e55e-4b06-4e96-aba8-0ebad8d8726c" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000008-b81422a1-1535-4777-ac61-3bcb7eef0900 already exists
  Warning  FailedMount             34s (x3 over 7m22s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[my-pv-volume], unattached volumes=[my-pv-volume default-token-9vnkn]: timed out waiting for the condition

The dmesg at this point:

[Sun Oct  8 15:11:28 2023] libceph: mon2 192.168.60.41:6789 session established
[Sun Oct  8 15:11:28 2023] libceph: mon2 192.168.60.41:6789 socket closed (con state OPEN)
[Sun Oct  8 15:11:28 2023] libceph: mon2 192.168.60.41:6789 session lost, hunting for new mon
[Sun Oct  8 15:11:28 2023] libceph: mon1 192.168.60.41:6789 session established
[Sun Oct  8 15:11:28 2023] libceph: client994145 fsid fd15bc40-b2ff-446d-a033-cf28ee416873
[Sun Oct  8 15:11:30 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:11:33 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:11:36 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:11:39 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:11:47 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:11:58 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:12:16 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:12:53 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:14:01 2023] libceph: osd1 192.168.60.34:6801 socket closed (con state CONNECTING)
[Sun Oct  8 15:14:24 2023] INFO: task mapper:5375 blocked for more than 120 seconds.
[Sun Oct  8 15:14:24 2023]       Tainted: G        W        4.15.0-175-generic #184-Ubuntu
[Sun Oct  8 15:14:24 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
microyahoo commented 7 months ago

MountVolume.MountDevice failed for volume "pvc-2150e55e-4b06-4e96-aba8-0ebad8d8726c" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000008-b81422a1-1535-4777-ac61-3bcb7eef0900 already exists

hi @wanghui-devops, if the cluster error or network problems will cause some commands to hang, this prompt message will be reported. So you can try the following steps.

  1. make sure the cluster health is ok.
  2. make sure the network connection is ok between cluster and your test host.
  3. or you can try to mount the rbd volume manually.
github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 5 months ago

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.