Open pablochacin opened 1 year ago
specialized executor that allows only one iteration in one VU and have no default
maxDuration
Not having a maxDuration
is not an option if we want it to run in the cloud. We need scenarios to have a maximum duration because otherwise we can't bill people or know how to allocate resources to run the tests. It is also probably a good thing to have for the upcoming native distributed execution.
When you know how long the disruption will be, why is having a maxDuration
of the scenario a problem? And how will this chaos executor be different from a one-VU one-iteration executor?
Or, to put it another way, why isn't this problem solvable with JS code and needs changes in k6 fundamentals? For example, a small JS helper can generate both the scenarios
config (for use in options) and return the exec
callback, right?
When you know how long the disruption will be, why is having a maxDuration of the scenario a problem?
Maybe I did not explain myself correctly. The problem is not having maxDuration
, but the confusion in having the duration defined both in the scenario and passed to method that injects the fault and what happens if those two don't match. The real objective is not have this redundancy.
Or, to put it another way, why isn't this problem solvable with JS code and needs changes in k6 fundamentals? For example, a small JS helper can generate both the scenarios config (for use in options) and return the exec callback, right?
Given that every scenario that runs a chaos will have this same issues described above, delegating it to a helper function that must be imported from a js library along with the xk6-disruptor doesn't seem the best developer experience, in my opinion. Not all what is possible is convenient.
However, if I could inject this scenario from the extension itself, that could be a good option, but I'm not sure if that is possible.
That is, if I could call the inject disruption method in the init code and this method creates the necessary configuration of scenarios that would be a much better option. One issue I see is how to pass the function to be executed by the scenario, for instance.setup
function
And how will this chaos executor be different from a one-VU one-iteration executor?
This executor will basically be a restricted version of the shared iterations executor that does not allow unsafe parameter combinations by removing those parameters. I don't see it as a "change in k6 fundamentals!.
I updated the proposal to clarify that the requirement is not to remove the maxDuration
attribute but to make the duration
attribute explicit.
I updated the proposal for providing more context on the use case that raised this requirement.
I understand that the primary goal the proposal for adding the chaos
executor is to provide better defaults for the 1 VU, 1 iteration scenario that must today be specified using the shared-iterations
executor.
I think the quote "Not all what is possible is convenient." captures the intention well.
A question about the assumptions:
I'm not a fan of helper functions for creating scenarios because they feel a little hacky, but to illustrate it, this is how it could be implemented today without making core changes.
import { chaosScenario } from 'https://jslib.k6.io/chaosUtils/1.0.0/index.js';
export function disrupt() {
// delay traffic from one random replica of the deployment
const fault = {
average_delay: 50,
error_code: 500,
error_rate: 0.1
}
podDisruptor.injectHttpFaults(fault, 60) // second argument is the duration of the disruption in seconds
}
export const options = {
// This scenario executes the tests
scenarios: {
load: {
executor: 'constant-arrival-rate',
rate: 100,
preAllocatedVUs: 10,
maxVUs: 100,
exec: "default",
startTime: '0s',
duration: "60s",
},
disrupt: chaosScenario({exec: "disrupt"}) // all other params are default but can be overriden
}
}
The downside of implementing a chaos
executor is that it's use-case specific. All other executors are generic and not tied to any single intended use case. Perhaps this is the right approach if we are building a generic reliability platform.
I imagine that if we add a chaos
executor into the core, users will start using it for unintended purposes making the code confusing. For example, when they want to run the code only once, they use the "chaos" executor. I would be more in favor of naming it only-once
rather than chaos
.
Another approach is implementing the chaos
executor as part of the xk6-disruptor codebase. I believe we have said in the past that creating new executors from extensions is possible, but it needs to be confirmed.
The downside of implementing a chaos executor is that it's use-case specific...I would be more in favor of naming it only-once rather than chaos.
Totally agree. I though of using that name, but to be honest, the only use case I'm aware now that would require this only-once execution is chaos.
Are we sure the fault injection will always be as simple as in the example where we have 1 iteration? Won't we have some types of disruptions where more faults are injected over time? For example, 1 fault every 30 seconds for 5 minutes
This is possible, indeed. I can think about some failures like killing pods that may follow this pattern. But I think this can be easily implemented using a loop and a sleep instead of using the fixed rate executor, for example.
What worries me more is the possibility of running more than one fault injection concurrently from multiple VUs.
Maybe, the goal should be to find a mechanism that prevents multiple concurrent executions of the disruptor to happen, but I don't see any, considering a distributed execution environment except using some form of Kubernetes resource or using annotations in the targets, but both are complex to implement reliably.
I'm not a fan of helper functions for creating scenarios because they feel a little hacky, but to illustrate it, this is how it could be implemented today without making core changes.
I was also of this impression, but seeing your proposal, it looks interesting. I'm wondering if this function could be exported from the extension, instead of a JS library. Is there any technical restriction on this regard, like not been able to call functions from extensions from the init
code?
I'm wondering if this function could be exported from the extension, instead of a JS library. Is there any technical restriction on this regard, like not been able to call functions from extensions from the
init
code?
Yes, of course, xk6 JS extensions (and built-in modules) primarily export JS functions that you can import
and use in regular JS :sweat_smile: I am unfamiliar with the xk6-disruptor codebase, but here is how the built-in sleep()
function is exported: https://github.com/grafana/k6/blob/8a74171c8b43f48fb53078e0bcfa7851828124bc/js/modules/k6/k6.go#L52-L63
https://github.com/grafana/k6/blob/8a74171c8b43f48fb53078e0bcfa7851828124bc/js/modules/k6/k6.go#L71
You can do something like that to export a chaosScenario()
function from one of the modules of any extension. For completeness' sake, here is how you should be able to make an executor extension: https://github.com/grafana/k6/blob/8a74171c8b43f48fb53078e0bcfa7851828124bc/lib/executor/shared_iterations.go#L19-L30
It's the same API that is used to register built-in executor types :sweat_smile: To be clear, we make no promises that this API won't change. In fact, it's quite likely to change (https://github.com/grafana/k6/issues/1302), but even after we fix the module structure, there would be no obstacles to providing an official API for executor extensions (https://github.com/grafana/k6/issues/2254). All of that said, I still think a custom executor type would be an overkill here, and the problem should be better solved some other way.
For example, if the current way to specify scenarios and executor options seems to error-prone and unwieldy, we should probably design a better way to specify them instead of having a bunch of semi-duplicate executor types? The goal of the different executors is to express different ways of scheduling work (iterations) among a set of workers (VUs). Where there is some precedent for overlapping executors (e.g. constant-vus
is a special case of ramping-vus
without any ramping, same for constant-arrival-rate
and ramping-arrival-rate
...), having a special executor for just 1-VU, 1-iteration scenarios seems a bit too specific.
Besides, this (and other similar things) seem like a completely plausible use cases that someone might want to test for:
Are we sure the fault injection will always be as simple as in the example where we have 1 iteration? Won't we have some types of disruptions where more faults are injected over time? For example, 1 fault every 30 seconds for 5 minutes.
But I think this can be easily implemented using a loop and a sleep instead of using the fixed rate executor, for example.
Implementing it with a for-loop would potentially introduce inaccuracies and delays from the time the operation takes to complete, unless you carefully account for that. And accounting for it would be certainly more complicated than configuring an executor even with the current scenarios
config.
What worries me more is the possibility of running more than one fault injection concurrently from multiple VUs.
Maybe, the goal should be to find a mechanism that prevents multiple concurrent executions of the disruptor to happen, but I don't see any, considering a distributed execution environment except using some form of Kubernetes resource or using annotations in the targets, but both are complex to implement reliably.
I don't understand the issues here, sorry. If you have a scenario with a single VU and a single iteration (or 1 iteration every 30s or whatever), with execution segments k6 will never schedule more than 1 VU, regardless of how many instances you have.
Hi @na-- Thanks your your extensive explanation. Really useful. Just a few comments/clarifications
Yes, of course, xk6 JS extensions (and built-in modules) primarily export JS functions that you can import and use in regular JS sweat_smile I am unfamiliar with the xk6-disruptor codebase, but here is how the built-in sleep() function is exported:
My question was more in the line of any restriction about using the functions exported by an extension in the init
code. From your explanation, it seems to me that there is none, so I will explore this way.
I don't understand the issues here, sorry. If you have a scenario with a single VU and a single iteration (or 1 iteration every 30s or whatever), with execution segments k6 will never schedule more than 1 VU, regardless of how many instances you have.
That is right. If there is only one VU multiple iterations should be ok. My comment was regarding to the risk of opening the use of any scenario configuration because there is this strict restriction of only one VU at a time.
having a special executor for just 1-VU, 1-iteration scenarios seems a bit too specific.
There was also another requirement regarding the duration
of each iteration. The Shared iterations executor does not allow this parameter. But I think this requirement can be also managed by generating the scenario using the helper function.
if the current way to specify scenarios and executor options seems to error-prone and unwieldy, we should probably design a better way to specify them instead of having a bunch of semi-duplicate executor types?
I don't think there's an issue with the executors from the perspective of generating load. As I mentioned before, I think the "problem" I was facing comes from the fact that faults are discrete events that occur at specific times during the execution of a test (generally, only once) and this seemed to me different from generating a workload.
But after this discussion I think the solution is more in the side of providing a different UX in the disruptor than changing the way k6 schedules work.
From your explanation, it seems to me that there is none, so I will explore this way.
Yes, there is no restriction. We have to impose that restriction artificially, when we want some function to be used only in the init context. For example: https://github.com/grafana/k6/blob/8a74171c8b43f48fb53078e0bcfa7851828124bc/js/modules/k6/data/data.go#L66-L71
Or only in the VU context: https://github.com/grafana/k6/blob/8a74171c8b43f48fb53078e0bcfa7851828124bc/js/modules/k6/k6.go#L142-L146
As you can see, the current way to ensure either is to check whether the VU State
is nil or not.
Edit: More context added to the proposal
The xk6-disruptor extension provides functions that allow the injection of faults during the execution of tests. Injecting faults in running tests follows a pattern that is different from the common use case for k6, which consists in generating multiple requests to simulate a workload from one or more users. Injecting faults, on the contrary, require the execution of single events. Each of these events (a fault injection) should have a defined duration.
This is accomplished by using two scenarios: one for running the test and another for injecting the faults as in the example below. The scenario that injects the fault must invoke the fault injection logic only once, therefore a shared iterations executor is used specifying one UV and one iteration.
However, this approach has some issues:
duration
attribute. Therefore the duration is defined in the invocation to the function that injects the faults (injectHttpFaults
in the example below). To make things more confusing, if this duration is longer than10m
, themaxDuration
attribute of the scenario must be specified or it will be cancelled while the load scenario (the one injecting load) continues its execution, but without the fault injection, creating misleading results.To overcome this limitations, instead of reusing existing executor for this particular use case, it would be convenient to have a specialized executor that allows only one iteration in one VU and have
no defaultamaxDuration
,duration
attribute so it must be explicitly definedif required (most of the time, it should not be required).In this way, following the Poka-yoke philosophy introducing errors will be hard by design.Following this ideas, the example above would be implemented as shown in the code below (some code omitted for brevity):
Additionally, a
duration
attribute could be defined in the scenario and made accessible by means of the k6-execution api so that it could used as a default value for the duration of the disruption function. In this way the duration would be more explicit in the definition of the scenario.