vesoft-inc / nebula-operator

Operation utilities for Nebula Graph
https://vesoft-inc.github.io/nebula-operator
Apache License 2.0
81 stars 30 forks source link

Operator should stop current load balancer job and start a new one if retry job constantly fails #512

Open kevinliu24 opened 4 months ago

kevinliu24 commented 4 months ago

Introduction Currently when a data balance job, the operator constantly retries a job until it succeed. Sometimes success is not possible (i.e. if the job was manually removed). We should detect the job status and retry only a few times after it fails. If it still fails, we should stop the current data balance job and start a new one.

Contents If a load balance job fails after a few retries, stop the current job and start a new one.

Related work