pingcap / tidb-operator

TiDB operator creates and manages TiDB clusters running in Kubernetes.
https://docs.pingcap.com/tidb-in-kubernetes/
Apache License 2.0
1.22k stars 489 forks source link

Make CRD parse failures in tidb-controller-manager more friendly #2534

Open kolbe opened 4 years ago

kolbe commented 4 years ago

Feature Request

Is your feature request related to a problem? Please describe:

If I have applied a malformed TidbMonitor CRD to the cluster, tidb-controller-manager does not handle it very well:

E0522 17:51:18.048163       1 runtime.go:78] Observed a panic: &errors.errorString{s:"cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'"} (cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$')
goroutine 439 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x18d0620, 0xc0003bec30)
        k8s.io/apimachinery@v0.0.0/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        k8s.io/apimachinery@v0.0.0/pkg/util/runtime/runtime.go:48 +0x82
panic(0x18d0620, 0xc0003bec30)
        runtime/panic.go:679 +0x1b2
k8s.io/apimachinery/pkg/api/resource.MustParse(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        k8s.io/apimachinery@v0.0.0/pkg/api/resource/quantity.go:127 +0x1c8
github.com/pingcap/tidb-operator/pkg/monitor/monitor.getMonitorPVC(0xc00093e600, 0x44985f)
        github.com/pingcap/tidb-operator@/pkg/monitor/monitor/util.go:769 +0x261
github.com/pingcap/tidb-operator/pkg/monitor/monitor.(*MonitorManager).syncTidbMonitorPVC(0xc00088ac80, 0xc00093e600, 0xc0003bec10, 0x0, 0x8)
        github.com/pingcap/tidb-operator@/pkg/monitor/monitor/monitor_manager.go:140 +0x40
github.com/pingcap/tidb-operator/pkg/monitor/monitor.(*MonitorManager).Sync(0xc00088ac80, 0xc00093e600, 0xc0008a5080, 0xc0006f14e0)
        github.com/pingcap/tidb-operator@/pkg/monitor/monitor/monitor_manager.go:91 +0x938
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*defaultTidbMonitorControl).reconcileTidbMonitor(...)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_control.go:51
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*defaultTidbMonitorControl).ReconcileTidbMonitor(0xc0008f5590, 0xc00093e600, 0x0, 0x0)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_control.go:43 +0x42
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).sync(0xc0007d9600, 0xc0006f14e0, 0x1f, 0x0, 0x0)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:146 +0x20a
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).processNextWorkItem(0xc0007d9600, 0x0)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:114 +0x100
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).worker(0xc0007d9600)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:101 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000676a60)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000676a60, 0x3b9aca00, 0x0, 0x1, 0xc000158420)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc000676a60, 0x3b9aca00, 0xc000158420)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/wait.go:88 +0x4d
created by github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).Run
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:94 +0x1d9
panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$' [recovered]
        panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

goroutine 439 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
        k8s.io/apimachinery@v0.0.0/pkg/util/runtime/runtime.go:55 +0x105
panic(0x18d0620, 0xc0003bec30)
        runtime/panic.go:679 +0x1b2
k8s.io/apimachinery/pkg/api/resource.MustParse(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        k8s.io/apimachinery@v0.0.0/pkg/api/resource/quantity.go:127 +0x1c8
github.com/pingcap/tidb-operator/pkg/monitor/monitor.getMonitorPVC(0xc00093e600, 0x44985f)
        github.com/pingcap/tidb-operator@/pkg/monitor/monitor/util.go:769 +0x261
github.com/pingcap/tidb-operator/pkg/monitor/monitor.(*MonitorManager).syncTidbMonitorPVC(0xc00088ac80, 0xc00093e600, 0xc0003bec10, 0x0, 0x8)
        github.com/pingcap/tidb-operator@/pkg/monitor/monitor/monitor_manager.go:140 +0x40
github.com/pingcap/tidb-operator/pkg/monitor/monitor.(*MonitorManager).Sync(0xc00088ac80, 0xc00093e600, 0xc0008a5080, 0xc0006f14e0)
        github.com/pingcap/tidb-operator@/pkg/monitor/monitor/monitor_manager.go:91 +0x938
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*defaultTidbMonitorControl).reconcileTidbMonitor(...)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_control.go:51
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*defaultTidbMonitorControl).ReconcileTidbMonitor(0xc0008f5590, 0xc00093e600, 0x0, 0x0)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_control.go:43 +0x42
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).sync(0xc0007d9600, 0xc0006f14e0, 0x1f, 0x0, 0x0)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:146 +0x20a
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).processNextWorkItem(0xc0007d9600, 0x0)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:114 +0x100
github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).worker(0xc0007d9600)
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:101 +0x2b
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000676a60)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/wait.go:152 +0x5e
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000676a60, 0x3b9aca00, 0x0, 0x1, 0xc000158420)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/wait.go:153 +0xf8
k8s.io/apimachinery/pkg/util/wait.Until(0xc000676a60, 0x3b9aca00, 0xc000158420)
        k8s.io/apimachinery@v0.0.0/pkg/util/wait/wait.go:88 +0x4d
created by github.com/pingcap/tidb-operator/pkg/controller/tidbmonitor.(*Controller).Run
        github.com/pingcap/tidb-operator@/pkg/controller/tidbmonitor/tidb_monitor_controller.go:94 +0x1d9

Describe the feature you'd like:

I'd like more information about where in the CRD the problem is, and I'd like the problem to be handled more cleanly rather than causing a crash!

Describe alternatives you've considered:

Better validate CRD before applying it to the cluster.

Teachability, Documentation, Adoption, Migration Strategy:

Yisaer commented 4 years ago

Better validate CRD before applying it to the cluster.

That's a good idea.

zjj2wry commented 4 years ago

@Yisaer

we can do semantic validation in the controller or ValidatingAdmission. The validation code can be reused, and it is better to do it in the ValidatingAdmission, because cr will not be saved to etcd, if we do it in the controller, we need to report warnning event. wdyt?

Yisaer commented 4 years ago

ValidationAdmission is a good idea. ValidationAdmission do help us to reduce the panic before saving in the etcd, we still need to handle the situation when we meet panic in controller-manager. @zjj2wry

In fact, we do have a validation webhook for pingcap resources (tidbcluster), but now it covered little range for them.

zjj2wry commented 4 years ago

we still need to handle the situation when we meet panic in controller-manager

agreed, because ValidationAdmission is an optional component, I think it is also needed in the controller manager

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 15 days

mianhk commented 2 years ago

/assign