Closed weizhoublue closed 4 months ago
@bzsuni 我查看这个 case,怀疑 shutdown node 的 case 和 正在运行 create DaemonSet 的 case 碰到一起。因为 check ds 全部 ready 的条件,导致提示超时。因为这时候假设 3 个 node,被 down 掉一个,那么 ds 的 status 2/3 Ready
。
[FAILED] Unexpected error:
<*errors.errorString | 0xc0005a1470>:
create DaemonSet time out
{
s: "create DaemonSet time out",
}
occurred
In [BeforeEach] at: /home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:70
• [FAILED] [30.964 seconds]
Reliability [Reliability]
/home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:30
Test the drift of the EIP [BeforeEach]
/home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:43
restart components [R00007]
/home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:225
restart kube-controller-manager
/home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:293
Timeline >>
> Enter [BeforeEach] Test the drift of the EIP - /home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:43 @ 01/30/24 21:01:18.749
succeeded to create the gateway: egw-1bfce234-0554-4941-a010-f0d9816fc2a1
v4DefaultEip: 172.18.6.3, v6DefaultEip: fc00:f853:ccd:e793::602
Automatically polling progress:
Reliability Test the drift of the EIP restart components restart kube-controller-manager (Spec Runtime: 20.001s)
/home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:293
In [BeforeEach] (Node Runtime: 20s)
/home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:43
Spec Goroutine
goroutine 3191 [sleep]
time.Sleep(0x1dcd6500)
/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/time.go:195
github.com/spidernet-io/egressgateway/test/e2e/common.CreateDaemonSet({0x2973030, 0xc0000501b8}, {0x297ac00, 0xc000145e60}, {0xc0009a4e80, 0x33}, {0xc000046246, 0x2e}, 0x0?)
/home/runner/work/egressgateway/egressgateway/test/e2e/common/ds.go:71
> github.com/spidernet-io/egressgateway/test/e2e/reliability_test.glob..func3.1.1()
/home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:69
|
| // daemonSet
> daemonSet, err = common.CreateDaemonSet(ctx, cli, "ds-reliability-"+uuid.NewString(), config.Image, time.Minute/2)
| Expect(err).NotTo(HaveOccurred())
| GinkgoWriter.Printf("succeeded to create DaemonSet: %s\n", daemonSet.Name)
github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3({0x0, 0x0})
/home/runner/work/egressgateway/egressgateway/vendor/github.com/onsi/ginkgo/v2/internal/node.go:463
github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
/home/runner/work/egressgateway/egressgateway/vendor/github.com/onsi/ginkgo/v2/internal/suite.go:889
github.com/onsi/ginkgo/v2/internal.(*Suite).runNode
/home/runner/work/egressgateway/egressgateway/vendor/github.com/onsi/ginkgo/v2/internal/suite.go:876
[FAILED] Unexpected error:
<*errors.errorString | 0xc0005a1470>:
create DaemonSet time out
{
s: "create DaemonSet time out",
}
occurred
In [BeforeEach] at: /home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:70 @ 01/30/24 21:01:49.046
@bzsuni 我查看这个 case,怀疑 shutdown node 的 case 和 正在运行 create DaemonSet 的 case 碰到一起。因为 check ds 全部 ready 的条件,导致提示超时。因为这时候假设 3 个 node,被 down 掉一个,那么 ds 的 status
2/3 Ready
。[FAILED] Unexpected error: <*errors.errorString | 0xc0005a1470>: create DaemonSet time out { s: "create DaemonSet time out", } occurred In [BeforeEach] at: /home/runner/work/egressgateway/egressgateway/test/e2e/reliability/reliability_test.go:70
Describe 中 加了 Serial,之下的所有 case 应该串行才对。
并且 AfterEach 中 加了 PowerOnNodesUntilClusterReady
全部开机,并且等待所有 pod ready
。
这里有点奇怪,还需要再看看
some reality indicates that restarting node is not a reliable approach is it possible just to restart the components such as api-server ?
@lou-lan @bzsuni any update on this ?
@lou-lan @bzsuni any update on this ?
It hasn't reappeared recently.
Create DaemonSet Timeout:Kwok mock 节点会存在 NotReady。测试在创建一个 DaemonSet 的时候,会使用一个限制时间(比如 30s)去检查每个节点的 Pod 在该时间是否是 Running。
This stage has been fixed, leaving only one https://github.com/spidernet-io/egressgateway/issues/1328
action url: https://github.com/spidernet-io/egressgateway/actions/runs/7716750992