grafana / k6-operator

An operator for running distributed k6 tests.
Apache License 2.0
576 stars 157 forks source link

Issues within initalizer error handling if script is incorrect #435

Open Phil1602 opened 1 month ago

Phil1602 commented 1 month ago

Brief summary

We realized, that our TestRuns get stuck without any information/logs printed out if the script itself is incorrect.

Cause

We already had a deeper look and realized that this is likely related to the Log message parsing of the k6 inspect executed within the initalizer here: https://github.com/grafana/k6-operator/blob/f75facb321d3c8ca55bbd9ba2f1895173d10bbc7/pkg/resources/jobs/initializer.go#L79

When we execute k6 inspect manually inside the container, we get the following error:

/ # k6 inspect --execution-requirements  /test/<REDACTED>.js
ERRO[0009] could not initialize '/test/<REDACTED>.js': could not load JS test 'file:///test/<REDACTED>.js': unknown executor type 'burst' 

The log parsing mentioned above, basically does a | grep 'level=error, which does not work for the error message we are facing since the log format seems to be different.

Might be related to: https://github.com/grafana/k6-docs/issues/877

k6-operator version or image

ghcr.io/grafana/k6-operator:controller-v0.0.14

Helm chart version (if applicable)

No response

TestRun / PrivateLoadZone YAML

Since we built a custom k6 image to include the script to be used as localfile, it would need some additional effort to make this available.

However, IMO this is not really related to a specific TestRun.

Other environment details (if applicable)

k6 version: k6 v0.51.0 (go1.22.4, linux/amd64)

Steps to reproduce the problem

Expected behaviour

Actual behaviour

Logs of k6-operator ``` 2024-07-30T09:11:00Z ERROR controllers.TestRun unable to marshal: `` {"namespace": "loadtesting", "name": "", "reconcileID": "", "error": "unexpected end of JSON input"} github.com/grafana/k6-operator/controllers.inspectTestRun /workspace/controllers/common.go:105 github.com/grafana/k6-operator/controllers.RunValidations /workspace/controllers/k6_initialize.go:55 github.com/grafana/k6-operator/controllers.(*TestRunReconciler).reconcile /workspace/controllers/testrun_controller.go:137 github.com/grafana/k6-operator/controllers.(*TestRunReconciler).Reconcile /workspace/controllers/testrun_controller.go:80 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:119 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:316 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227 2024-07-30T09:11:00Z ERROR Reconciler error {"controller": "testrun", "controllerGroup": "k6.io", "controllerKind": "TestRun", "TestRun": {"name":"","namespace":"loadtesting"}, "namespace": "loadtesting", "name": "", "reconcileID": "", "error": "unexpected end of JSON input"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227 ```
frittentheke commented 1 month ago

This loosely relates to https://github.com/grafana/k6-operator/pull/401 which is about treating initializer errors as error state of the whole TestRun CR.

yorugac commented 1 month ago

Hi @Phil1602, as mentioned by @frittentheke, this indeed has been raised and fixed: could you please update k6-operator to the latest version and try again? Thanks!

Also, in general, it is recommended to debug k6 scripts locally before deploying the TestRun :slightly_smiling_face:

Phil1602 commented 4 weeks ago

Hi @yorugac,

We are using k6 verification within our pipeline as a step before creating the TestRun in the meantime. Anyways, IMO it would have been still an issue, if a wrong TestRun is not reported as such.

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

frittentheke commented 4 weeks ago

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

@yorugac while https://github.com/grafana/k6-operator/pull/401 does indeed treat an error of the Initializer Pod as error of the TestRun CR (https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/controllers/common.go#L59).

The cause of the issue @Phil1602 reported here is with the exit code (leading to the Pod actually failing) though. If you look at https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/pkg/resources/jobs/initializer.go#L79 you'll notice that here are multiple commands chained and piped together. While && causes the first command with non-zero exit code to fail (and that code be returned) the second part applying the grep will then actually mask the k6 inspect (the most important bit of this command) - https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/pkg/resources/jobs/initializer.go#L79C124-L79C167

I went through the initializer logic some more and just pushed PR https://github.com/grafana/k6-operator/pull/450. I know this changes a little more than just fixing this issue here. But I strongly believe reducing the interface width (exit code + termination message) allows the Initalizer to really strive and be much more flexible than it is how.

I as a user can then run any image and any (list of) command and the only thing I have to ensure is that a non-zero exit code is used if there is an issue with the test.

frittentheke commented 3 weeks ago

I pushed a bugfix PR in https://github.com/grafana/k6-operator/issues/453, just fixing the issue reported by @Phil1602

^^ @yorugac