scylladb / scylla-operator

The Kubernetes Operator for ScyllaDB
https://operator.docs.scylladb.com/
Apache License 2.0
331 stars 163 forks source link

When a test framework setup fails, no test are run but the run succeed #2098

Open tnozicka opened 2 weeks ago

tnozicka commented 2 weeks ago

When any assertion in the Framework fails, no test are executed but the run succeeds. It can be simulated by minting the framework like

func NewFramework(namePrefix string) *Framework {
    o.Expect(true).To(o.BeFalse())

this go through ginkgo.Recover and calls the fail handler correctly but my suspicion is that when no test gets registered is forgets to check the fail handler. I've tried failing the individual tests before the framework creation directly and that has correctly failed the run.

go run ./cmd/scylla-operator-tests run all --ingress-controller-address=$( kubectl -n haproxy-ingress get svc haproxy-ingress --template='{{ .spec.clusterIP }}' ) --loglevel=2 --parallelism=1 --progress --delete-namespace-policy=OnSuccess --feature-gates='AutomaticTLSCertificates=true' --artifacts-dir=/tmp/so-artifacts --fail-fast --loglevel=5
Flag --delete-namespace-policy has been deprecated, --delete-namespace-policy is deprecated - please use --cleanup-policy instead
I0827 19:39:14.798450  632799 tests/tests.go:74] maxprocs: Leaving GOMAXPROCS=[16]: CPU quota undefined
I0827 19:39:14.799273  632799 tests/tests_run.go:213] "scylla-operator-tests run" version "unknown"
I0827 19:39:14.799288  632799 flag/flags.go:64] FLAG: --artifacts-dir="/tmp/so-artifacts"
I0827 19:39:14.799292  632799 flag/flags.go:64] FLAG: --burst="75"
I0827 19:39:14.799297  632799 flag/flags.go:64] FLAG: --cleanup-policy="OnSuccess"
I0827 19:39:14.799300  632799 flag/flags.go:64] FLAG: --color="true"
I0827 19:39:14.799303  632799 flag/flags.go:64] FLAG: --delete-namespace-policy="OnSuccess"
I0827 19:39:14.799304  632799 flag/flags.go:64] FLAG: --dry-run="false"
I0827 19:39:14.799306  632799 flag/flags.go:64] FLAG: --fail-fast="true"
I0827 19:39:14.799307  632799 flag/flags.go:64] FLAG: --feature-gates="AutomaticTLSCertificates=true"
I0827 19:39:14.799321  632799 flag/flags.go:64] FLAG: --flake-attempts="0"
I0827 19:39:14.799322  632799 flag/flags.go:64] FLAG: --focus="[]"
I0827 19:39:14.799328  632799 flag/flags.go:64] FLAG: --gcs-service-account-key-path=""
I0827 19:39:14.799330  632799 flag/flags.go:64] FLAG: --help="false"
I0827 19:39:14.799332  632799 flag/flags.go:64] FLAG: --ingress-controller-address="10.111.86.222"
I0827 19:39:14.799335  632799 flag/flags.go:64] FLAG: --ingress-controller-custom-annotations="[]"
I0827 19:39:14.799340  632799 flag/flags.go:64] FLAG: --ingress-controller-ingress-class-name=""
I0827 19:39:14.799343  632799 flag/flags.go:64] FLAG: --kubeconfig="[]"
I0827 19:39:14.799347  632799 flag/flags.go:64] FLAG: --label-filter=""
I0827 19:39:14.799349  632799 flag/flags.go:64] FLAG: --loglevel="5"
I0827 19:39:14.799351  632799 flag/flags.go:64] FLAG: --object-storage-bucket=""
I0827 19:39:14.799353  632799 flag/flags.go:64] FLAG: --parallel-loglevel="0"
I0827 19:39:14.799355  632799 flag/flags.go:64] FLAG: --parallel-server-address=""
I0827 19:39:14.799357  632799 flag/flags.go:64] FLAG: --parallel-shard="0"
I0827 19:39:14.799359  632799 flag/flags.go:64] FLAG: --parallelism="1"
I0827 19:39:14.799361  632799 flag/flags.go:64] FLAG: --progress="true"
I0827 19:39:14.799363  632799 flag/flags.go:64] FLAG: --qps="50"
I0827 19:39:14.799367  632799 flag/flags.go:64] FLAG: --quiet="false"
I0827 19:39:14.799369  632799 flag/flags.go:64] FLAG: --random-seed="1724780354"
I0827 19:39:14.799372  632799 flag/flags.go:64] FLAG: --s3-credentials-file-path=""
I0827 19:39:14.799375  632799 flag/flags.go:64] FLAG: --scyllacluster-clients-broadcast-address-type="PodIP"
I0827 19:39:14.799388  632799 flag/flags.go:64] FLAG: --scyllacluster-node-service-type="Headless"
I0827 19:39:14.799390  632799 flag/flags.go:64] FLAG: --scyllacluster-nodes-broadcast-address-type="PodIP"
I0827 19:39:14.799393  632799 flag/flags.go:64] FLAG: --scyllacluster-storageclass-name=""
I0827 19:39:14.799395  632799 flag/flags.go:64] FLAG: --skip="[]"
I0827 19:39:14.799398  632799 flag/flags.go:64] FLAG: --timeout="24h0m0s"
I0827 19:39:14.799401  632799 flag/flags.go:64] FLAG: --v="5"
I0827 19:39:14.799548  632799 tests/tests_run.go:299] "Running specs"
Running Suite: Scylla operator E2E tests - /home/dev/dev/go/src/github.com/scylladb/scylla-operator
===================================================================================================
Random Seed: 1724780354 - will randomize all specs

Will run 0 of 0 specs

Ran 0 of 0 Specs in 0.000 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 0 Skipped
rzetelskik commented 2 weeks ago

When any assertion in the Framework fails, no test are executed but the run succeeds. It can be simulated by minting the framework like

func NewFramework(namePrefix string) *Framework {
  o.Expect(true).To(o.BeFalse())

this go through ginkgo.Recover and calls the fail handler correctly but my suspicion is that when no test gets registered is forgets to check the fail handler.

This is by design (as in we're using ginkgo incorrectly) - we run framework initialisation in a container node, so any failing assertions there will fail at the spec tree construction phase, not when running specs, see https://onsi.github.io/ginkgo/#no-assertions-in-container-nodes. Iirc putting GinkgoRecover there only causes it to (unintentionally from our perspective) pass silently on a failed assertion - if you remove it you'll see that the spec tree construction fails. GinkgoRecover shouldn't be called in the container nodes in the first place, see e.g. https://github.com/onsi/ginkgo/issues/931#issuecomment-1040512693.

Imo to properly fix this we'd have to move all initialisation in the framework from the container nodes to setup nodes. I think I actually tried to do this at some point but it required quite a few changes given how the framework and tests are set up now.

tnozicka commented 2 weeks ago

yeah, I guess we should put stuff that can fail (init) into a beforeEach

the weird thing is that is works if not all nodes fail...

rzetelskik commented 2 weeks ago

the weird thing is that is works if not all nodes fail...

Can you give an example? I'm not sure what that means