Phil1602 commented 1 month ago

Brief summary

We realized, that our TestRuns get stuck without any information/logs printed out if the script itself is incorrect.

Cause

We already had a deeper look and realized that this is likely related to the Log message parsing of the k6 inspect executed within the initalizer here: https://github.com/grafana/k6-operator/blob/f75facb321d3c8ca55bbd9ba2f1895173d10bbc7/pkg/resources/jobs/initializer.go#L79

When we execute k6 inspect manually inside the container, we get the following error:

/ # k6 inspect --execution-requirements  /test/<REDACTED>.js
ERRO[0009] could not initialize '/test/<REDACTED>.js': could not load JS test 'file:///test/<REDACTED>.js': unknown executor type 'burst'

The log parsing mentioned above, basically does a | grep 'level=error, which does not work for the error message we are facing since the log format seems to be different.

Might be related to: https://github.com/grafana/k6-docs/issues/877

k6-operator version or image

ghcr.io/grafana/k6-operator:controller-v0.0.14

Helm chart version (if applicable)

No response

TestRun / PrivateLoadZone YAML

Since we built a custom k6 image to include the script to be used as localfile, it would need some additional effort to make this available.

However, IMO this is not really related to a specific TestRun.

Other environment details (if applicable)

k6 version: k6 v0.51.0 (go1.22.4, linux/amd64)

Steps to reproduce the problem

Deploy any TestRun resource with a broken JS script

Expected behaviour

Initalizer Pods fails or at least logs the issues which are encountered during k6 inspect
TestRun resource is in Failed state

Actual behaviour

Initalizer Pod runs but does not print out anything
The k6-operator-controller-manager logs errors (see below) about unparsable JSON without any information
TestRun resource stuck in initilization phase

Logs of k6-operator

``` 2024-07-30T09:11:00Z ERROR controllers.TestRun unable to marshal: `` {"namespace": "loadtesting", "name": "", "reconcileID": "", "error": "unexpected end of JSON input"} github.com/grafana/k6-operator/controllers.inspectTestRun /workspace/controllers/common.go:105 github.com/grafana/k6-operator/controllers.RunValidations /workspace/controllers/k6_initialize.go:55 github.com/grafana/k6-operator/controllers.(*TestRunReconciler).reconcile /workspace/controllers/testrun_controller.go:137 github.com/grafana/k6-operator/controllers.(*TestRunReconciler).Reconcile /workspace/controllers/testrun_controller.go:80 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:119 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:316 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227 2024-07-30T09:11:00Z ERROR Reconciler error {"controller": "testrun", "controllerGroup": "k6.io", "controllerKind": "TestRun", "TestRun": {"name":"","namespace":"loadtesting"}, "namespace": "loadtesting", "name": "", "reconcileID": "", "error": "unexpected end of JSON input"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:329 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:266 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.2/pkg/internal/controller/controller.go:227 ```

frittentheke commented 1 month ago

This loosely relates to https://github.com/grafana/k6-operator/pull/401 which is about treating initializer errors as error state of the whole TestRun CR.

yorugac commented 1 month ago

Hi @Phil1602, as mentioned by @frittentheke, this indeed has been raised and fixed: could you please update k6-operator to the latest version and try again? Thanks!

Also, in general, it is recommended to debug k6 scripts locally before deploying the TestRun :slightly_smiling_face:

Phil1602 commented 4 weeks ago

Hi @yorugac,

We are using k6 verification within our pipeline as a step before creating the TestRun in the meantime. Anyways, IMO it would have been still an issue, if a wrong TestRun is not reported as such.

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

frittentheke commented 4 weeks ago

I will try out the latest release v0.0.16 and verify your assumptions! Thanks for the hints!

@yorugac while https://github.com/grafana/k6-operator/pull/401 does indeed treat an error of the Initializer Pod as error of the TestRun CR (https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/controllers/common.go#L59).

The cause of the issue @Phil1602 reported here is with the exit code (leading to the Pod actually failing) though. If you look at https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/pkg/resources/jobs/initializer.go#L79 you'll notice that here are multiple commands chained and piped together. While && causes the first command with non-zero exit code to fail (and that code be returned) the second part applying the grep will then actually mask the k6 inspect (the most important bit of this command) - https://github.com/grafana/k6-operator/blob/d9490ded7c3e0cf615e2e9d41e82a842fdae7ac8/pkg/resources/jobs/initializer.go#L79C124-L79C167

I went through the initializer logic some more and just pushed PR https://github.com/grafana/k6-operator/pull/450. I know this changes a little more than just fixing this issue here. But I strongly believe reducing the interface width (exit code + termination message) allows the Initalizer to really strive and be much more flexible than it is how.

I as a user can then run any image and any (list of) command and the only thing I have to ensure is that a non-zero exit code is used if there is an issue with the test.

frittentheke commented 3 weeks ago

I pushed a bugfix PR in https://github.com/grafana/k6-operator/issues/453, just fixing the issue reported by @Phil1602

^^ @yorugac

grafana / k6-operator

Issues within initalizer error handling if script is incorrect #435