grafana / k6-operator

An operator for running distributed k6 tests.
Apache License 2.0
589 stars 166 forks source link

Make reading logs by initializer a more reliable operation #193

Open yorugac opened 1 year ago

yorugac commented 1 year ago

Envoy is not usable with cloud output when logging is enabled. See https://github.com/grafana/k6-operator/issues/179. It’d be good to figure out some alternative in order not to depend on logs in the initializer so much.

Context. Previously, adding k6 dependency to the operator to calculate MaxVUs and Duration was deemed as too heavy to add directly - that's how initializer job appeared. Check if that changed in recent refactoring of k6.

yorugac commented 1 year ago

Additionally, k6 Operator is not usable in some specific cases of k6 scripts. It was reported in #207 and see explanation here.

k6 docs about k6 archive, k6 inspect and k6 cloud behaving differently than k6 run around env vars: https://k6.io/docs/using-k6/environment-variables/

It's a bit of mess of inconsistent behaviour. It'd be good to have a generalized solution to this, in order not to stumble over k6 exceptions.

yorugac commented 10 months ago

The same problem can manifest even without cloud output but in any scenario when there is another process writing to the output of the initializer Pod. Which sounds a bit strange at first glance but it does happen, case in study: https://community.grafana.com/t/k6-operator-error-not-spawning-starter-and-basic-resources/110778/19

Renaming the issue to reflect new reality.

yorugac commented 3 months ago

After some additional thinking and feedback :slightly_smiling_face:

Initializer pod is currently used for two reasons:

  1. cloud output tests require duration and max VUs values in order to be started: those are retrieved with k6 inspect command.
  2. prior validation of a script. k6 Operator test runs can have lots of runners and before scheduling them, it makes sense to at least confirm that the script is valid.

If there is a way to implement this without running a pod but by calling k6 Go library, we could remove initializer pod completely! That would simplify running tests for users (because of initializer-specific errors mentioned above), simplify TestRun CRD definition and simplify logic of TestRun controller. Now to figure out if it can be done with k6 Go code...

yorugac commented 2 months ago

Another important reason to solve this: regular issues with configuring env vars for k6 scripts. The error appears because of k6 quirks and inconsistencies between different commands, and people regularly stumble on this. Some sample issues:

yorugac commented 2 months ago

A few times, we've received reports on reading logs not being possible, for example:

yorugac commented 3 weeks ago

Another case when the problem can manifest is if one writes an extension and logs something (even fmt.Println) in the init function of the extension.

frittentheke commented 2 weeks ago

@yorugac would you consider a bigger rework of how the initializer is integrated? See my comment at: https://github.com/grafana/k6-operator/pull/453#issuecomment-2431121129

yorugac commented 1 week ago

Hi @frittentheke, I've finally grokked both your PRs and commented there accordingly 🙌

would you consider a bigger rework of how the initializer is integrated?

Well, this issue, #193, does exist, so that can be taken as "certainly" 😄

I actually hope to find the solution that would allow to remove initializer pod completely. But that kind of solution obviously takes awhile. So in general, we definitely could do with additional improvements of the current flow. On a condition that they don't break existing use cases, of course.