Open GrayedFox opened 1 year ago
__ENV
. k6/execution
solves this concrete case.give users the ability to opt in to having
__ENV
actually refer to a host machine's environment variables -- perhaps adding that doing so voids your warranty and that this mode is ignored for cloud test runs.
Having to have features that suddenly just no longer work in the cloud or in any kind of distributed fashion seems very bad and not inline with a tool for load testing and k6 specifically.
While we might have not worked very directly on distributed execution it is very much being worked on. There are just a bunch of different pieces that need to be done in order for that to be viable. And in practice a lot of those have been fixed or are being fixed. Still it is some time away, but ...
Adding this will make distributed execution less viable and consistent.
host machines' environment variables
If you are talking about changing the env variables in the outside process running k6 - I am not certain we can do that, and I am definitely against it.
So I would presume you mean what k6 things are the env variables.
At this point you are asking why is __ENV
not ... special I guess. As currently it is populated with env variable for each VU and then it just another global variable. As the different VUs run on different JS VMs - they also happen to have different global variables.
This is a bit more complicated yet as __ENV
is populated with not only the env variables and what you have provided as --env
but also what is defined per scenario. So even if we make them magical now the question is does overriding a scenario defined env be set for all VUs or only the one in the scenario. Should scenario VUs be separated on __ENV
to begin with? We can probably get some answers to those, but I would argue any answer will be ... troublesome and likely a thing someone else will argue should've been done the other way around.
Background
I am no expert but AFAIK:
node basically does inter process communication between node instances - which arguably is what distributed execution will have to do some extent. And will be a bit more involved than this.
Browsers have some ways to communicate between different pages (which is usually where the different JS VMs map to) but even there it is way more complicated than what you propose.
In V8 there's a myriad of ways to pass data
Are those ways between two concurrently running V8 VMs?
K6 has neither of those things and given the nature of the product (multiple, concurrent processes representing virtual users or sessions) it would benefit greatly, imho, from a way for different VUs to send and receive messages down the line - which should avoid many of the pitfalls and complexity involved in managing state across VUs and having too much mutability across different parts of the API by putting that problem squarely in the hands of the user (fine by me!).
I am not disagreeing with you here, but I am very much against shoe horning this on top of an already established API that does not have this semantics and in practice never had them.
And arguably all of the examples you have given are basically not doing anything as strange as letting you write to a variable as it is just a normal variable and then seeing that in what is in practice a different process as if the second process have changed it.
k6/execution
can be used in the concrete case.Real Life Use Case
Looking at the whole example it seems like a somewhat more complicated case for k6/execution
.
I am not certain I got the whole code as .. well you skipped the __ENV
part entirely and then you also do not show the actual test 🤷♂.
But if each iteration does 1 request you can do: (you can also do it for any constant amount of requests)
// edit: the below counter was wrong in a earlier version.
let globalIndex = exec.scenario.iterationInTest; // import exec from "k6/execution" is required
let product;
for (let vendor in verdors) {
if (vendor.products.length*2 > globalIndex) {// the 2 comes from you doing 2 requests with each product;
globalIndex -= vendor.products.length*2;
continue;
}
product = vendor.products[globalIndex/2]; // same reason for the 2
}
if (product === undefined) {
// we ran out of products
// you can loop again to start from the beginning
}
// rest of your code;
__ENV.somevar++;
is NOT an atomic operation which means that you have the classical race condition unless we also introduce:
++
In practice, we can add such an API ... but I doubt it. But I am pretty sure it won't be put on top of `__ENV.
You can always:
k6/exucution
as I show above and transform the issue in going through arrays.k6/redis
- which is involved, but likely not as much.iteartionInScenario
but you can increment it on your own instead of doing it each iteration.Actually the fact that there is not an extension for this already seems to mean to me it is even less common as what I originally thought - as users have created extensions for all kind of stuff.
I guess 4.
for this particular case I would argue you can just go with using the http.asyncRequest
and do everything asynchronously. You can then have 1 VU that does multiple requests asynchronously.
The downside is that in order for that to make requests in parallel you will need to make multiple requests in the same iteration - preferably going through the whole Array in one k6 iteration.
I am going to leave this open so we can have some discussion, and I expect that there will be other proposed solutions, so we can have them here.
Thank you for the very detailed and considered response, I appreciate the time and effort you've put in here 🙏🏾
Skipping ahead a bit:
k6/execution can be used in the concrete case.
Whelp. I'm taking another look at the docs again now, - if I understand correctly, the vu.iterationInScenario
("The identifier of the iteration in the current scenario") -- does this number represent the overall iteration count across all VUs? Or the total iterations of a single VU? Some quick testing on my end will of course answer that question just want to surface it here for future eye balls.
If so, I just honed in on the wrong identifier here, and this is all that I need and achieves what I want without any of the global ENV shenanigans 🙈
Follow up q: does vu.iterationInInstance
and scenario.iterationInInstance
refer to the same variable? They have a slightly different description in the docs is all.
Extension(s) is a better solution for anyone not caring about distributed execution.
It certainly looks like extensions are a better bet for users that aren't concerned with distributing tests, will look into that further. We were also thinking about extending the JS API to add some utility methods anyhow and building the binary on our end is something we're happy to do, also thanks for the link to your counter extension ⚡
If you are talking about changing the env variables in the outside process running k6 - I am not certain we can do that, and I am definitely against it... So I would presume you mean what k6 things are the env variables.
To be completely honest, I was referring to the host machine's (the outside processes') env variables - but I'm less concerned with the actual implementation and more interested in thinking about a low hanging (ish) scenario that would allow for conditional logic in different iterations across different VUs that doesn't necessarily tie us to the per-vu-executor
.
...I am very much against shoe horning this on top of an already established API that does not have this semantics and in practice never had them.
Fair. If doing this via the existing __ENV API is a gross misapplication of what that API is intended for, it shouldn't be done via that API.
You're spot on about the race condition there, I'm only just learning Go and admittedly haven't given the K6 code base a proper look in just yet - the interactions between the JS side and the Go side of things (i.e. goja and the go-to-js bridge) are high up on the to-read list. Good to know incrementing counters this way doesn't represent an atomic operation on either side and therefore isn't thread safe.
But if each iteration does 1 request you can do: (you can also do it for any constant amount of requests)
Each iteration does only one we care about. The relevant part of the test script is that it calls getNextProduct()
as one of the first actions to get the product data. I wanted to demo doing something on the JS side that attempts tries to do sequential reads of an immutable SharedArray across VUs - but your way is much more streamlined, especially if the scenario.iterationInScenario
functions the way I hope it does - then we get the benefits of using the shared executor too.
Maybe the solution here is, in the end, also just about naming things (slash updating the docs):
iterationInInstance
references in each tableIt might be obvious to some, I just normally think of ENV
as a data struct that represents information shared by everything (all procs, VUs, VMs, etc) tied to a single "application", even if that application is distributed across a network or involves lots of sub-processes -- which, I guess, is quite a programmatic feat and a bit magic depending on the setup.
Sorry for the slow reply - I was on PTO until today :)
Whelp. I'm taking another look at the docs again now, - if I understand correctly, the vu.iterationInScenario ("The identifier of the iteration in the current scenario") -- does this number represent the overall iteration count across all VUs? Or the total iterations of a single VU? Some quick testing on my end will of course answer that question just want to surface it here for future eye balls.
Sorry about that got the wrong counter :facepalm: :bow:
What I meant was scenario.iterationInTest
which is unique for the whole test (even in cloud or with k6-operator).
doesn't necessarily tie us to the per-vu-executor.
I don't understand what you mean in that whole paragraph.
I'm only just learning Go and admittedly haven't given the
This really has nothing to do with go
and barely anything to do with js
and the particular implementation of goja. In theory goja could've implemented ++
as some kind of atomic operator - but there is literally no reason to do it that way. And I am pretty sure it is likely against the specification, but am not going to go figure out the exact "why" now.
The point is that ECMAScript/JavaScript is single-threaded by specification more or less. So any kind of "atomic" stuff is kind of bolted on top. And this is unlikely to change and k6 can't just decide to do a bunch of stuff differently to somehow change that. Or at least we could - but we will break a lot of js code.
immutable SharedArray across VUs - but your way is much more streamlined, especially if the
With big enough (1k+ items) array type data you are still advised to use SharedArray as it will save you memory - sometimes drastically. It has nothing to do with how you are going to iterate it. If it isn't an array - it can't be put in SharedArray. If it is an array - you are going to access it as an array whether it is a SharedArray or not.
Maybe the solution here is, in the end, also just about naming things (slash updating the docs):
I have opened https://github.com/grafana/k6-docs/pull/1167 as I also somehow grabbed the wrong counter above.
I just normally think of
ENV
as a data struct that represents information shared by everything
This might be true - but it will be very optimistic (or super not performant) if it wasn't read only ;)
Brief summary
Environment variable changes aren't picked up across different VUs.
Before you hang me!
I've searched around the forums and I understand that, right now, one of the larger issues plaguing the community is one of semantics: naming things is hard and
__ENV
can be more correctly though of as a shorthand way of turning the host machine's environment variables into script parameters - something that the nameENV
does not belay, thus making it a huge misnomer.But being able to read and write to a host machines environment vars has many valid use cases and is something I hope the K6 team will at least consider - hence this report and some ranting and raving as to why 🙏🏾
I could also phrase this as a feature request (that might have been better) like so: give users the ability to opt in to having
__ENV
actually refer to a host machine's environment variables -- perhaps adding that doing so voids your warranty and that this mode is ignored for cloud test runs.k6 version
k6 v0.44.0 (2023-04-24T10:36:01+0000/v0.44.0-0-g14d80f6f, go1.20.3, linux/amd64)
OS
Ubuntu 22.04.2 LTS
Docker version and image (if applicable)
No response
Steps to reproduce the problem
Can be reproduced by using either of the following executors:
I thought this issue was to do with both VUs being instantiated at the same time but I've added a sleep to represent varied response times during a test. Here is some example output:
As we can see, the first VU runs 13 times and the second VU 7 times (expected given the sleeps) for the shared executor while the per vu executor ensures each VU runs 10 times each (also as expected). What surprised me was that changes to the __ENV variable do not affect the next iteration of other VUs.
Expected behaviour
TLDR: I would expect that changes to an
__ENV
var are picked up by different VUs on the same host machine for the next iteration.I would expect output like this (for the shared executor):
Ideally the
__ENV
change is picked up by each VU iteration allowing for information to be shared accross VUs. Sharing via a SharedArray isn't possible due to it being immutable.Background
1722 and #2370 seem to indicate a desire to move away from mutable environment variables and instead treat these values as script parameters (which is sort of how they work at the moment anyway).
This is a significant departure from how environment variables work in different ecosystems - I know K6 isn't Node or V8, nor should it try to be - but there's a lot to be gained from allowing test writers to change environment variables and furthermore ensuring those changes are picked up by different VUs on the same host machine.
In Node, developers can specify whether or not they want to execute something as a child_process or not, which will inherit any environment variables of the parent process. This allows for fine tuned control over environment variables, including child processes being sandboxed so that env changes them don't pollute the global scope - but they are also able to send messages to the parent process which can change an environment variable and/or propagate those changes to other child processes.
In V8 there's a myriad of ways to pass data around and the browser itself doesn't have any direct access to environment variables (but bundlers like webpack can read them during transpile time). Environment variables inside a browser aren't really a thing and there's closures and classes now to take care of encapsulation and a myriad of ways to send messages and move data around and communicate state.
K6 has neither of those things and given the nature of the product (multiple, concurrent processes representing virtual users or sessions) it would benefit greatly, imho, from a way for different VUs to send and receive messages down the line - which should avoid many of the pitfalls and complexity involved in managing state across VUs and having too much mutability across different parts of the API by putting that problem squarely in the hands of the user (fine by me!).
But I digress and that's all a bit of a pipe dream for now. I understand that a core tenant of the K6 design philosophy is repeatability and this includes thinking about tests running in the k6 cloud which may very well end up on being run on separate physical devices.
Actual behaviour
TLDR:
__ENV
changes are only visible inside the VU that makes the change.Real Life Use Case
There are many users, like myself, that use K6:
What this boils down to is this: I would wager a very large swath of your users run their K6 test suite from a single physical device and would therefore benefit from having a way to allow VUs to communicate even if it is by some rudimentary means like by being able to read and write to the host machine's environment variables.
Take the following example JSON representing actual test data. It's read into a K6 readonly
SharedArray
:Now this is a small snippet of a JSON file containing 1k lines or so. To make the math easy, let's say we have 50 individual products split uniformly across 10 vendors, i.e. as in the above example, so that each vendor has 5 products.
Thing is we use each product for each vendor twice and want to measure trends of individual vendors - this isn't a stress or performance test. Ideally we would want a shared executor, using 2 or more VUs, making sure that the script file has a few sleeps sprinkled in. The iterations would be capped at 100.
By allowing the VUs to see changes to the host machines environment variables I could handle all of that logic myself by doing something like this (example uses TypeScript):
This would achieve many things:
SharedArray
readonly struct remains readonly but becomes vastly more useful, solving not only memory consumption issues, but allowing test writers to conditionally feed data into tests without tightly coupling their code to an underlying VU instance ID (indeed, without requiring the k6/execution module at all)Note the
TestRunData
references are to getters and setters which read or write to__ENV
- implementation is hidden but it's very straight forward. It doesn't work, but imho it should.