kubewharf / kubebrain

A High Performance Metadata System for Kubernetes
Apache License 2.0
771 stars 77 forks source link

[Bug] Delete resource error #29

Open fly2F opened 1 year ago

fly2F commented 1 year ago

What happened?

Commands:

kubectl create deployment nginx --image=nginx
kubectl delete deployment nginx

Error:

Error from server: invalid DeleteRange response: [response_range:<header:<revision:84 > kvs:<key:"/registry/deployments/default/nginx" mod_revision:81 value:"k8s\000\n\025\n\007apps/v1\022\nDeployment\022\374\007\n\371\005\n\005nginx\022\000\032\007default\"\000*$c876d2c2-0701-464a-83a9-942161d4e3a52\0008\001B\010\010\264\371\345\247\006\020\000Z\014\n\003app\022\005nginx\212\001\237\005\n\016kubectl-create\022\006Update\032\007apps/v1\"\010\010\264\371\345\247\006\020\0002\010FieldsV1:\345\004\n\342\004{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:progressDeadlineSeconds\":{},\"f:replicas\":{},\"f:revisionHistoryLimit\":{},\"f:selector\":{},\"f:strategy\":{\"f:rollingUpdate\":{\".\":{},\"f:maxSurge\":{},\"f:maxUnavailable\":{}},\"f:type\":{}},\"f:template\":{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"nginx\\\"}\":{\".\":{},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:resources\":{},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{}}},\"f:dnsPolicy\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:terminationGracePeriodSeconds\":{}}}}}B\000\022\357\001\010\001\022\016\n\014\n\003app\022\005nginx\032\250\001\n\036\n\000\022\000\032\000\"\000*\0002\0008\000B\000Z\014\n\003app\022\005nginx\022\205\001\022@\n\005nginx\022\005nginx*\000B\000j\024/dev/termination-logr\006Always\200\001\000\210\001\000\220\001\000\242\001\004File\032\006Always \0362\014ClusterFirstB\000J\000R\000X\000`\000h\000r\000\202\001\000\212\001\000\232\001\021default-scheduler\302\001\000\"'\n\rRollingUpdate\022\026\n\t\010\001\020\000\032\00325%\022\t\010\001\020\000\032\00325%(\0000\n8\000H\330\004\032\014\010\000\020\000\030\000 \000(\0008\000\032\000\"\000" > > ]

What did you expect to happen?

Delete resources without error

How can we reproduce it (as minimally and precisely as possible)?

STEP1: build kube-brain make badger, start kube-brain ./bin/kube-brain --compatible-with-etcd STEP2: start kube-apiserver

# build kube-apiserver, in kubernetes project path
go build ./cmd/kube-apiserver

# gen cert and start kube-apiserver, the '--etcd-servers' parameter points to kube-brain
./kube-apiserver --etcd-servers=http://127.0.0.1:2379 \
    --service-account-signing-key-file=cert/service-account-key.pem \
    --service-account-issuer=https://127.0.0.1:6443 \
    --service-account-key-file=cert/ca-key.pem \
    --token-auth-file=cert/token.csv \
    --tls-cert-file=cert/kubernetes.pem \
    --tls-private-key-file=cert/kubernetes-key.pem \
    --client-ca-file=cert/ca.pem

STEP3: delete rsource

kubectl create deployment nginx --image=nginx
kubectl delete deployment nginx

Error from server: invalid DeleteRange response: [response_range:<header:<revision:675 > kvs:<key:"/registry/deployments/default/nginx" mod_revision:674 value:"k8s\000\n\025\n\007apps/v1\022\nDeployment\022\374\007\n\371\005\n\005nginx\022\000\032\007default\"\000*$431b9682-ad70-40cc-b921-636e7a99c43b2\0008\001B\010\010\212\375\345\247\006\020\000Z\014\n\003app\022\005nginx\212\001\237\005\n\016kubectl-create\022\006Update\032\007apps/v1\"\010\010\212\375\345\247\006\020\0002\010FieldsV1:\345\004\n\342\004{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:progressDeadlineSeconds\":{},\"f:replicas\":{},\"f:revisionHistoryLimit\":{},\"f:selector\":{},\"f:strategy\":{\"f:rollingUpdate\":{\".\":{},\"f:maxSurge\":{},\"f:maxUnavailable\":{}},\"f:type\":{}},\"f:template\":{\"f:metadata\":{\"f:labels\":{\".\":{},\"f:app\":{}}},\"f:spec\":{\"f:containers\":{\"k:{\\\"name\\\":\\\"nginx\\\"}\":{\".\":{},\"f:image\":{},\"f:imagePullPolicy\":{},\"f:name\":{},\"f:resources\":{},\"f:terminationMessagePath\":{},\"f:terminationMessagePolicy\":{}}},\"f:dnsPolicy\":{},\"f:restartPolicy\":{},\"f:schedulerName\":{},\"f:securityContext\":{},\"f:terminationGracePeriodSeconds\":{}}}}}B\000\022\357\001\010\001\022\016\n\014\n\003app\022\005nginx\032\250\001\n\036\n\000\022\000\032\000\"\000*\0002\0008\000B\000Z\014\n\003app\022\005nginx\022\205\001\022@\n\005nginx\022\005nginx*\000B\000j\024/dev/termination-logr\006Always\200\001\000\210\001\000\220\001\000\242\001\004File\032\006Always \0362\014ClusterFirstB\000J\000R\000X\000`\000h\000r\000\202\001\000\212\001\000\232\001\021default-scheduler\302\001\000\"'\n\rRollingUpdate\022\026\n\t\010\001\020\000\032\00325%\022\t\010\001\020\000\032\00325%(\0000\n8\000H\330\004\032\014\010\000\020\000\030\000 \000(\0008\000\032\000\"\000" > > ]

Software version

```console branch: main commit ID: 0cc3be740589fd51db3367c7037669602814e120 ```
fly2F commented 1 year ago

I traced the code and found that the wrong data structure was returned when delete.

diff --git a/pkg/server/etcd/backendshim.go b/pkg/server/etcd/backendshim.go
index 1b56007..00525f8 100644
--- a/pkg/server/etcd/backendshim.go
+++ b/pkg/server/etcd/backendshim.go
@@ -145,10 +145,11 @@ func (b *backendShim) Delete(ctx context.Context, key []byte, revision int64) (*
                Succeeded: response.Succeeded,
                Responses: []*etcdserverpb.ResponseOp{
                        {
-                               Response: &etcdserverpb.ResponseOp_ResponseRange{
-                                       ResponseRange: &etcdserverpb.RangeResponse{
-                                               Header: txnHeader(int64(response.Header.Revision)),
-                                               Kvs:    kvs,
+                               Response: &etcdserverpb.ResponseOp_ResponseDeleteRange{
+                                       ResponseDeleteRange: &etcdserverpb.DeleteRangeResponse{
+                                               Header:  txnHeader(int64(response.Header.Revision)),
+                                               PrevKvs: kvs,
+                                               Deleted: int64(len(kvs)),
                                        },
                                },
                        },
(END)
fly2F commented 1 year ago

Is this an experimental project? Has it been used in an internal production environment?

divanodestiny commented 1 year ago

@fly2F sorry for the late response. which version of k8s do you used?

divanodestiny commented 1 year ago

@fly2F i find the problem. https://github.com/kubernetes/kubernetes/pull/113369 fix the related issue which make k8s depend on this field. I think current response can match the requirement of k8s <=1.25. it will be great if you have time to fix it ~ or i will try to fix the problem when i am free.

fly2F commented 1 year ago

@divanodestiny Thank you very much for your reply. I used v1.24.0 code for testing and the deletion function is normal.

liangyuanpeng commented 4 months ago

Is this an experimental project? Has it been used in an internal production environment?

cc @divanodestiny it would be great if we have some ansower.

divanodestiny commented 4 months ago

Is this an experimental project? Has it been used in an internal production environment?

cc @divanodestiny it would be great if we have some ansower.

we use it in our product env. the largest k8s has 20k+ nodes.