kubewharf / kubebrain

A High Performance Metadata System for Kubernetes
Apache License 2.0
764 stars 79 forks source link

在更新中,遇到大量的删除失败与更新失败的 #36

Open wqsgi opened 4 months ago

wqsgi commented 4 months ago

What happened?

在更新中,遇到大量的删除失败与更新失败的 I0428 17:48:58.883419 3007632 kv.go:139] "txn failed" op="update" key="/registry/leases/kube-node-lease/kwok-node-3901"

I0426 18:29:10.242165 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766090 actualRev=449345926506766094 I0426 18:29:50.393991 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766236 actualRev=449345926506766239 I0426 18:30:30.558691 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766380 actualRev=449345926506766383 I0426 18:31:10.721504 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766524 actualRev=449345926506766527 I0426 18:31:50.889791 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766662 actualRev=449345926506766666 I0426 18:32:31.060937 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766807 actualRev=449345926506766810 I0426 18:33:11.225357 3007632 txn.go:164] "delete cas failed" expectRev=449345926506766950 actualRev=449345926506766953 I0426 18:33:51.393198 3007632 txn.go:164] "delete cas failed" expectRev=449345926506767095 actualRev=449345926506767098 I0426 18:34:31.549499 3007632 txn.go:164] "delete cas failed" expectRev=449345926506767238 actualRev=449345926506767240 I0426 18:35:11.705058 3007632 txn.go:164] "delete cas failed" expectRev=449345926506767380 actualRev=449345926506767384

删除pod的时候会非常的慢,这个原因是什么?

What did you expect to happen?

这个如何解决

How can we reproduce it (as minimally and precisely as possible)?

非常容易复现

Software version

```console $ version # paste output here ```
wqsgi commented 4 months ago

k8s的版本是1.29

divanodestiny commented 4 months ago
  1. kubebrain can not work well with apiserver > 1.25 due to known issue https://github.com/kubewharf/kubebrain/issues/29
  2. apiserver may not get latest object in time, so it send delete request with an old revision which cause CAS failed.

I advise that you can test with apiserver <= 1.25 again.