spidernet-io / spiderpool

Underlay and RDMA network solution of the Kubernetes, for bare metal, VM and any public cloud
https://spidernet-io.github.io/spiderpool/
Apache License 2.0
538 stars 75 forks source link

The kdoctor agent crashed, causing the test to fail. #4283

Open github-actions[bot] opened 1 week ago

github-actions[bot] commented 1 week ago

action url: https://github.com/spidernet-io/spiderpool/actions/runs/11824947909

ty-dc commented 1 week ago

timeout waiting for kdoctor task to finish

  Successfully obtained the latest status of kdoctor task: {ExpectedRound:<nil> DoneRound:<nil> Finish:false FinishTime:<nil> LastRoundStatus:<nil> History:[] Resource:&TaskResource{RuntimeName:kdoctor-netreach-one-macvlan-standalone-5216-679663260RuntimeType:DaemonSetServiceNameV4:nilServiceNameV6:*kdoctor-netreach-one-macvlan-standalone-5216-679663260-ipv6RuntimeStatus:creating}} 
  Successfully obtained the latest status of kdoctor task: {ExpectedRound:<nil> DoneRound:<nil> Finish:false FinishTime:<nil> LastRoundStatus:<nil> History:[] Resource:&TaskResource{RuntimeName:kdoctor-netreach-one-macvlan-standalone-5216-679663260RuntimeType:DaemonSetServiceNameV4:nilServiceNameV6:*kdoctor-netreach-one-macvlan-standalone-5216-679663260-ipv6RuntimeStatus:creating}} 
  Automatically polling progress:
    MacvlanUnderlayOne In underlay mode, verify single CNI network kdoctor connectivity should be succeed (Spec Runtime: 10m0.057s)
      /home/runner/work/spiderpool/spiderpool/test/e2e/coordinator/macvlan-underlay-one/macvlan_underlay_one_test.go:60
      In [It] (Node Runtime: 10m0.034s)
        /home/runner/work/spiderpool/spiderpool/test/e2e/coordinator/macvlan-underlay-one/macvlan_underlay_one_test.go:60

      Spec Goroutine
      goroutine 35 [sleep]
        time.Sleep(0x2540be400)
          /opt/hostedtoolcache/go/1.23.2/x64/src/runtime/time.go:315
      > github.com/spidernet-io/spiderpool/test/e2e/coordinator/macvlan-underlay-one_test.init.func2.1.2()
          /home/runner/work/spiderpool/spiderpool/test/e2e/coordinator/macvlan-underlay-one/macvlan_underlay_one_test.go:192
            |           }
            |       }
            >       time.Sleep(10 * time.Second)
            |   }
            | }
        github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3({0x0?, 0x0?})
          /home/runner/work/spiderpool/spiderpool/vendor/github.com/onsi/ginkgo/v2/internal/node.go:472
        github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
          /home/runner/work/spiderpool/spiderpool/vendor/github.com/onsi/ginkgo/v2/internal/suite.go:894
        github.com/onsi/ginkgo/v2/internal.(*Suite).runNode in goroutine 7
          /home/runner/work/spiderpool/spiderpool/vendor/github.com/onsi/ginkgo/v2/internal/suite.go:881
  [FAILED] timeout waiting for kdoctor task to finish
  In [It] at: /home/runner/work/spiderpool/spiderpool/test/e2e/coordinator/macvlan-underlay-one/macvlan_underlay_one_test.go:152

crash pod

kube-system          kdoctor-agent-kq2nr                                            1/1     Running            1 (41m ago)      42m   fc00:f853:ccd:e793:f::9    spiderpool1113201542-control-plane   <none>           <none>            app.kubernetes.io/component=kdoctor-agent,app.kubernetes.io/instance=kdoctor,app.kubernetes.io/name=kdoctor-agent,controller-revision-hash=7567c59cd5,pod-template-generation=1
kube-system          kdoctor-agent-ln7w9                                            1/1     Running            1 (41m ago)      42m   fc00:f853:ccd:e793:f::8    spiderpool1113201542-worker          <none>           <none>            app.kubernetes.io/component=kdoctor-agent,app.kubernetes.io/instance=kdoctor,app.kubernetes.io/name=kdoctor-agent,controller-revision-hash=7567c59cd5,pod-template-generation=1
kube-system          kdoctor-controller-747496fdc7-jnlbt                            1/1     Running            1 (41m ago)      42m   fc00:f853:ccd:e793:f::7    spiderpool1113201542-worker          <none>           <none>            app.kubernetes.io/component=kdoctor-controller,app.kubernetes.io/instance=kdoctor,app.kubernetes.io/name=kdoctor-controller,pod-template-hash=747496fdc7
kube-system          kdoctor-netreach-one-macvlan-standalone-5216-679663260-jb7jd   0/1     CrashLoopBackOff   6 (3m43s ago)    10m   fc00:f853:ccd:e793:f::2    spiderpool1113201542-worker          <none>           <none>            app.kubernetes.io/component=kdoctor-agent,app.kubernetes.io/instance=kdoctor,app.kubernetes.io/name=kdoctor-netreach-one-macvlan-standalone-5216-679663260,controller-revision-hash=6866547f76,pod-template-generation=1
kube-system          kdoctor-netreach-one-macvlan-standalone-5216-679663260-tfltn   0/1     CrashLoopBackOff   5 (3s ago)       10m   fc00:f853:ccd:e793:f::4    spiderpool1113201542-control-plane   <none>           <none>            app.kubernetes.io/component=kdoctor-agent,app.kubernetes.io/instance=kdoctor,app.kubernetes.io/name=kdoctor-netreach-one-macvlan-standalone-5216-679663260,controller-revision-hash=6866547f76,pod-template-generation=1
Events:
  Type     Reason          Age                  From               Message
  ----     ------          ----                 ----               -------
  Normal   Scheduled       10m                  default-scheduler  Successfully assigned kube-system/kdoctor-netreach-one-macvlan-standalone-5216-679663260-tfltn to spiderpool1113201542-control-plane
  Normal   AddedInterface  10m                  multus             Add eth0 [fc00:f853:ccd:e793:f::4/64] from kube-system/macvlan-vlan0
  Warning  Unhealthy       9m39s                kubelet            Startup probe failed: Get "http://[fc00:f853:ccd:e793:f::4]:5710/healthy/startup": dial tcp [fc00:f853:ccd:e793:f::4]:5710: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Normal   Pulled          9m36s (x2 over 10m)  kubelet            Container image "ghcr.io/kdoctor-io/kdoctor-agent:v0.2.1" already present on machine
  Normal   Created         9m36s (x2 over 10m)  kubelet            Created container kdoctor-agent
  Normal   Started         9m36s (x2 over 10m)  kubelet            Started container kdoctor-agent
  Warning  Unhealthy       5s (x291 over 10m)   kubelet            Startup probe failed: Get "http://[fc00:f853:ccd:e793:f::4]:5710/healthy/startup": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

detailed log

e2edebugLog 3.txt