pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 493 forks source link

PD configuration updates do not work #487

Closed aylei closed 5 years ago

aylei commented 5 years ago

Bug Report

kubernetes: 1.12.6 tidb-operator: latest

What did you do?

  1. Change the value pd.maxReplicas from 3 to 5 in values.yaml
  2. Run helm upgrade
  3. Waiting the rolling-update complete
  4. Get pd config by curl <host>:2379/pd/api/v1/config

What did you expect to see? The replication.max-replicas is updated to 5.

What did you see instead? The replication.max-replicas is still 3.

According to @nolouch , PD do not change the configuration once the config file has been persisted. We may have to:

After investigating the code https://github.com/pingcap/pd/blob/master/server/leader.go#L398-L411 , the schedule configurations and replication configurations are persisted in ETCD and cannot be updated through config file.

zyguan commented 5 years ago

I thought this is a known problem. helm upgrade only update the config files for some value changes, there is no reload (or restart) action currently, thus all components (tidb, tikv) have related problems.

We also need to figure out which values (not only pd config) won't take effect after upgrade. For pd, the restful api (or directly using the code of pd-ctl) might be a solution, for tikv, I'm not sure is there any simple way to hot reload config without restart the process.

aylei commented 5 years ago

I thought this is a known problem. helm upgrade only update the config files for some value changes, there is no reload (or restart) action currently, thus all components (tidb, tikv) have related problems.

We also need to figure out which values (not only pd config) won't take effect after upgrade. For pd, the restful api (or directly using the code of pd-ctl) might be a solution, for tikv, I'm not sure is there any simple way to hot reload config without restart the process.

479 introduce rolling-updates of PD/TiKV/TiDB nodes on configuration update, 'restart' is the intended behavior.

The problem is that the scheduler and replication configuration is persisted in etcd and won't be updated after a rolling-update.

zyguan commented 5 years ago

Ah, I see, I prefer documenting it too.