PD is corrupting `etcd` database on restart

Bug Report

If you restart a PD pod, you receive the following panic:

[2024/08/19 15:16:25.624 +00:00] [WARN] [server.go:297] ["exceeded recommended request limit"] [max-request-bytes=157286400] [max-request-size="157 MB"] [recommended-request-bytes=10485760] [recommended-request-size="10 MB"]
2024-08-19 15:16:25.624904 W | pkg/fileutil: check file permission: directory "/var/lib/pd" exist, but the permission is "drwxr-xr-x". The recommended permission is "-rwx------" to prevent possible unprivileged access to the data.
[2024/08/19 15:16:25.636 +00:00] [PANIC] [backend.go:173] ["failed to open database"] [path=/var/lib/pd/member/snap/db] [error="invalid database"]
panic: failed to open database
goroutine 251 [running]:
go.uber.org/zap/zapcore.CheckWriteAction.OnWrite(0x2?, 0x2?, {0x0?, 0x0?, 0xc0001364a0?})
    /root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:196 +0x54
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0006f52b0, {0xc0012b9980, 0x2, 0x2})
    /root/go/pkg/mod/go.uber.org/zap@v1.27.0/zapcore/entry.go:262 +0x3ec
go.uber.org/zap.(*Logger).Panic(0xc001299f80?, {0x304f490?, 0x16?}, {0xc0012b9980, 0x2, 0x2})
    /root/go/pkg/mod/go.uber.org/zap@v1.27.0/logger.go:285 +0x51
go.etcd.io/etcd/mvcc/backend.newBackend({{0xc001299f80, 0x1a}, 0x5f5e100, 0x2710, {0x30191e2, 0x5}, 0x233333333, 0xc000053980, 0x0})
    /root/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20240320135013-950cd5fbe6ca/mvcc/backend/backend.go:173 +0x35c
go.etcd.io/etcd/mvcc/backend.New(...)
    /root/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20240320135013-950cd5fbe6ca/mvcc/backend/backend.go:151
go.etcd.io/etcd/etcdserver.newBackend({{0x7ffd58f397d8, 0xe}, {0x0, 0x0}, {0x0, 0x0}, {0xc0003086c0, 0x1, 0x1}, {0xc000308480, ...}, ...})
    /root/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20240320135013-950cd5fbe6ca/etcdserver/backend.go:53 +0x3b0
go.etcd.io/etcd/etcdserver.openBackend.func1()
    /root/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20240320135013-950cd5fbe6ca/etcdserver/backend.go:74 +0x45
created by go.etcd.io/etcd/etcdserver.openBackend in goroutine 1
    /root/go/pkg/mod/go.etcd.io/etcd@v0.5.0-alpha.5.0.20240320135013-950cd5fbe6ca/etcdserver/backend.go:73 +0x106

The PVC hat the cephfs.csi.ceph.com provisioner. The cluster is running on microk8s.

Checking the etcd database with bbolt, digging a bit deeper results in the following error:

$ ./go/bin/bbolt page --all --format-value=redacted db
cannot read number of pages: the Meta Page has wrong (unexpected) magic

What did you do?

Create a new TidbCluster:

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: surrealdb
spec:
  version: v8.2.0
  timezone: UTC
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  configUpdateStrategy: RollingUpdate
  discovery: {}
  helper:
    image: alpine:3.16.0
  pd:
    baseImage: pingcap/pd
    replicas: 1
    maxFailoverCount: 0
    mountClusterClientSecret: true
    storageClassName: csi-cephfs-sc
    requests:
      storage: "16Gi"
    config: {}
  tikv:
    baseImage: pingcap/tikv
    maxFailoverCount: 0
    evictLeaderTimeout: 1m
    replicas: 3
    storageClassName: csi-cephfs-sc
    requests:
      storage: "16Gi"
    config:
      storage:
        reserve-space: "0MB"
      rocksdb:
        max-open-files: 256
      raftdb:
        max-open-files: 256
  tidb:
    baseImage: pingcap/tidb
    maxFailoverCount: 0
    replicas: 5
    service:
      type: ClusterIP
    config: {}

Restart PD pod (e.g. if you drain a node on updating Kubernetes)
Getting the panic

What did you expect to see?

pd not corrupting the etcd database.

What did you see instead?

A panic of the PD container because the etcd database is corrupted.

What version of PD are you using (`pd-server -V`)?

[root@surrealdb-pd-0 /]# ./pd-server -V
Release Version: v8.2.0
Edition: Community
Git Commit Hash: c0ee2cd6c2eea7ad9372cc5bd00f6774abad6834
Git Branch: HEAD
UTC Build Time:  2024-07-04 09:39:38

tikv / pd

PD is corrupting `etcd` database on restart #8547

Bug Report

What did you do?

What did you expect to see?

What did you see instead?

What version of PD are you using (`pd-server -V`)?

tikv / pd

PD is corrupting `etcd` database on restart #8547

Bug Report

What did you do?

What did you expect to see?

What did you see instead?

What version of PD are you using (pd-server -V)?

What version of PD are you using (`pd-server -V`)?