m3db / m3

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform
https://m3db.io/
Apache License 2.0
4.7k stars 451 forks source link

Cluster Unavailability During Node Removal #4260

Open angelala00 opened 7 months ago

angelala00 commented 7 months ago

Hello M3DB Community,

I am encountering an issue with my M3DB cluster during a node removal operation. My cluster initially had 5 nodes, and I needed to scale down to 4 nodes. To do this, I used the following command:

curl -X DELETE :<M3_COORDINATOR_PORT(default 7201)>/api/v1/services/m3db/placement/ After executing this command, the cluster began rebalancing the shard data as expected. However, I faced an issue where the cluster became unavailable during this process. Here are some details:

Cluster Size Before Removal: 5 nodes Node Removal Process: Using the above CURL command Observed Issue: Cluster became unavailable during shard rebalancing I followed the operational guidelines available at M3DB Operational Guide, but I am unsure what might have gone wrong. My expectation was that the cluster should remain available during a scale-down operation.

Could you please help me understand the following:

What are the common causes for a cluster becoming unavailable during a node removal process? Are there any specific configurations or precautions that need to be taken to ensure cluster availability during such operations? Is there any known issue or limitation with the version of M3DB that might affect the node removal process? Any insights or guidance would be greatly appreciated. I am happy to provide more details if needed.

Thank you in advance for your help!

angelala00 commented 7 months ago

@robskillington @cw9 @vdarulis

vdarulis commented 7 months ago

It's hard to tell without current/previous placement (eg. debug bundles) and graphs at the time of the issue - since a 4 node cluster can only tolerate a single node failure, another node bootstrapping or restarting at the time of delete would cause write and/or query failures.

Was the cluster created with RF=3?

angelala00 commented 7 months ago

It's hard to tell without current/previous placement (eg. debug bundles) and graphs at the time of the issue - since a 4 node cluster can only tolerate a single node failure, another node bootstrapping or restarting at the time of delete would cause write and/or query failures.

Was the cluster created with RF=3?

Thank you very much for your response!

My cluster had 5 nodes,placement like this: { "placement":{ "instances":{ "hostname1":{ "id": "hostname1", "isolationGroup": "group1", "zone": "embedded", "weight": 100, "endpoint": "ip1:9000", "shards": [ { "id": 1, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" }, { "id": 2, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" }, ...... ], "shardSetId": 0, "hostname": "ip1", "port": 9000, "metadata": { "debugPort": 0 } }, "hostname2":{ "id": "hostname2", "isolationGroup": "group2", "zone": "embedded", "weight": 100, "endpoint": "ip2:9000", "shards": [ { "id": 0, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" }, { "id": 2, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" } ...... ], "shardSetId": 0, "hostname": "ip1", "port": 9000, "metadata": { "debugPort": 0 } } ...... }, "replicaFactor": 3, "numShards": 64, "isSharded": true, "cutoverTime": "0", "isMirrored": false, "maxShardSetId": 0 }, "version": 33 }

when I run this command: curl -X DELETE http://localhost:7201/api/v1/services/m3db/placement/hostnam1

then the placement is like this: { "placement":{ "instances":{ "hostname1":{ "id": "hostname1", "isolationGroup": "group1", "zone": "embedded", "weight": 100, "endpoint": "ip1:9000", "shards": [ { "id": 1, "state": "LEAVING", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" }, { "id": 2, "state": "LEAVING", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" }, ...... ], "shardSetId": 0, "hostname": "ip1", "port": 9000, "metadata": { "debugPort": 0 } }, "hostname2":{ "id": "hostname2", "isolationGroup": "group2", "zone": "embedded", "weight": 100, "endpoint": "ip2:9000", "shards": [ { "id": 0, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" }, { "id": 1, "state": "INITIALIZING", "sourceId": "hostname1", "cutoverNanos": "0", "cutoffNanos": "0" }, { "id": 2, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" } ...... ], "shardSetId": 0, "hostname": "ip1", "port": 9000, "metadata": { "debugPort": 0 } } ...... }, "replicaFactor": 3, "numShards": 64, "isSharded": true, "cutoverTime": "0", "isMirrored": false, "maxShardSetId": 0 }, "version": 34 }

and then I call query interface host:7201/api/v1/query , it show error {"status":"error","error":"unable to satisfy consistency requirements: shards=22, err=[error fetching tagged from host hostname2: Error({Type:INTERNAL_ERROR Message:index is not yet bootstrapped to read})]"}

so what more information we need?

angelala00 commented 7 months ago

"Is this a bug? Or is it intended to be designed this way? Or did I do something wrong?" please help! @vdarulis

vdarulis commented 7 months ago

What state shard 22 is across nodes? What's the query consistency level? It might be a single shard isn't bootstrapped across all replicas. There are no known issues with this sort of configuration and node operations. Without a full dump of configs for coordinator and dbnodes, as well as full placement status, there's not enough for anyone to go on. If you're on k8s, using m3db-operator is preferred.

angelala00 commented 7 months ago

What state shard 22 is across nodes? What's the query consistency level? It might be a single shard isn't bootstrapped across all replicas. There are no known issues with this sort of configuration and node operations. Without a full dump of configs for coordinator and dbnodes, as well as full placement status, there's not enough for anyone to go on. If you're on k8s, using m3db-operator is preferred.

Thanks again! Here is the config file m3dbnode.yml image image I deploy the cluster with binaries and my m3db version is 0.15.14. curl http://localhost:7201/api/v1/services/m3db/placement the status of shard 22, all three replicas is AVAILABLE. { "id": 22, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" } { "id": 22, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" } { "id": 22, "state": "AVAILABLE", "sourceId": "", "cutoverNanos": "0", "cutoffNanos": "0" } so, can we go on? @vdarulis

vdarulis commented 7 months ago

I don't see anything up with config, the bootstrappers/consistency level are correct for this setup.

I deploy the cluster with binaries and my m3db version is 0.15.14.

This is about 1100~ commits behind latest release - definitely try upgrading first

angelala00 commented 7 months ago

I don't see anything up with config, the bootstrappers/consistency level are correct for this setup.

I deploy the cluster with binaries and my m3db version is 0.15.14.

This is about 1100~ commits behind latest release - definitely try upgrading first

Okay, then I'll upgrade the version and try. Which stable version is recommended?