Open mstoykov opened 3 years ago
Being able to make a SharedArray
in VU code would also be very helpful for this use case: https://github.com/k6io/k6/issues/1638#issuecomment-768107236
And I agree returning a SharedArray from setup()
will be quite hard. I'm not sure if that's what we should do, but if we do it, we should also probably whitelist a few other JS types that currently get mangled because of the JSON encoding and decoding, like ArrayBuffer
, typed arrays, built-in k6 types, etc. :man_shrugging:
This forum topic is another example where the first part of this issue would have been useful. With the new JS module APIs, implementing that basically boils down to deleting these 3 lines, right? https://github.com/grafana/k6/blob/c4b88f59189705dfa4d97fec0057e202104d5db1/js/modules/k6/data/data.go#L89-L91
If we remove this (now artificial) restriction, everything else should "just work", I think... :thinking: It should allow users to create a SharedArray
in one scenario and then use it in another scenario with a later startTime
. Sort of a delayed init / setup()
alternative that can make HTTP requests and save the results in a SharedArray
so they can be used in other scenarios on the same k6 instance.
There are two potential pitfalls that I can think of:
SharedArray
in setup will probably currently fail, as mentioned above. The proper solution for this would be fully implementing the second part of the issue above. I personally prefer the proposed approach of a SharedObject
-like implementation.
However, if that is too difficult, as the first iteration I think we can simply implement the json.Marshaler
interface by the current SharedArray
object, which should transform SharedArray
objects returned from setup()
into normal JS arrays. In the MarshalJSON
method we can also emit a warning (until we properly implement this) that doing so is probably a bad idea, since it completely negates any benefits from SharedArray
.SharedArray
created during an iteration won't be properly shared among multiple k6 instances (e.g. in a k6 cloud
or a k6-operator test run). I have some ideas of how to properly solve this in the long term, however in the short term we can just emit a warning that explains the problem when a SharedArray
is created in VU code and the execution segment of the instance is not 100%.In any case, I think it's worth it to remove the artificial restriction around SharedArray
usage in VU code. There are certain pitfalls, true, but it will unlock a lot of use cases that are currently not possible to cover by k6...
of a SharedObject-like
Does it mean a k6's wrapped object that supports safe concurrency?
as the first iteration I think we can simply implement the json.Marshaler interface by the current SharedArray object
The SharedArray under the hood is a DynamicArray, do we have a way for implementing the interface on it or it would require a wrap? https://github.com/grafana/k6/blob/c4b88f59189705dfa4d97fec0057e202104d5db1/js/modules/k6/data/share.go#L43-L54
Will the JSON.stringify
function trigger the warning? Would we have a workaround for it? :thinking:
const data = new SharedArray('some name', function () {
});
export function setup() {
console.log(JSON.stringify(data)) // I guess it shouldn't emit the warning
}
Making the SharedArray makeable in the default function also can help that as a single VU only will call the initialize code ... once.
It seems a more distributed-friendly solution, if it hasn't critical problems, why not encourage users to use it until we will have a complete working solution for the setup()
function?
Does it mean a k6's wrapped object that supports safe concurrency? The SharedArray under the hood is a DynamicArray, do we have a way for implementing the interface on it or it would require a wrap?
Yes, something like a Proxy
that allows for the dynamic sharing of arbitrary nested data, the bulk of which has only one copy in memory and when parts of it are requested, they are copied on demand to the VU that requested them. Basically, a more powerful version of SharedArray
.
Will the JSON.stringify function trigger the warning? Would we have a workaround for it?
Yes, why not? JSON.stringify()
-ing the current SharedArrays is also probably a mistake, so no reason not to print a warning :man_shrugging:
Allowing the initialization of SharedArray within the setup context would solve an important use case for my organization. Alternatively we could also solve it if we were allowed to make http requests within the SharedArray function that is run during in the init context.
Our use case is that we have a large file (~100MB) that we would like to share across many distributed k8s pods running our test. We were hoping to put it on s3 and download it in the test. However we find that we are not allowed to fetch the s3 object within the SharedArray initialization because http requests are not allowed in the init context. Furthermore if we download the s3 object in the setup context, we are unable to initialize a SharedArray because this can only be done in the init context. If we put the file contents in the data block that is returned by the setup function our understanding is that memory usage will grow quickly as we increase number of VUs.
Hi @Perzach! We had the same problem a few months ago and we found a workaround that maybe will fit for you as well. Let me explain it a bit:
We have a single repository where each team in my company could write their tests scripts. Those scripts and the large files with real data to trigger load tests are generated in a GitHub Action and copied to an EFS volume via running a pod. It's a reusable job so you could use it directly from our repo if you have self-hosted runners. And here is the manifest for the pod able to copy script tests and large files to the volume:
apiVersion: v1
kind: Pod
metadata:
name: k6-pvc-copy
spec:
volumes:
- name: k6-pvc-copy-storage
persistentVolumeClaim:
claimName: k6-runner-tests
containers:
- name: k6-pvc-copy-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/test"
name: k6-pvc-copy-storage
resources:
requests:
cpu: "100m"
memory: "100Mi"
limits:
cpu: "100m"
memory: "100Mi"
Then the pods running the test can mount the EFS volume and launch the tests using those large files already there in the volume. Here you have an example of what we have (in this case is a template for Argo Workflows but the K6 manifest is the same, you just need to change the inputs
):
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
name: k6-{{`{{inputs.parameters.team}}`}}
spec:
parallelism: {{`{{inputs.parameters.parallelism}}`}}
script:
volumeClaim:
name: k6-runner-tests
file: {{`{{inputs.parameters.team}}`}}/test/{{`{{inputs.parameters.scriptName}}`}}
arguments: --out prometheus=namespace=k6
ports:
- containerPort: 5656
name: metrics
runner:
image: ACCOUNT.dkr.ecr.REGION.amazonaws.com/REPO/IMAGE
resources:
requests:
cpu: {{`{{inputs.parameters.cpuRequests}}`}}
memory: {{`{{inputs.parameters.memoryRequests}}`}}
limits:
cpu: {{`{{inputs.parameters.cpuLimits}}`}}
memory: {{`{{inputs.parameters.memoryLimits}}`}}
Don't hesitate to ask for more details. Hope it helps!
Thanks @guillermotti for the feedback. Are you saying that if we use the script.volumeClaim
syntax to mount the test script, we are inherently getting access to all the other files that are coexisting in the relevant volume within the pod running the test?
If so, this was greatly helpful and we'll give it a shot.
For reference, we will be reading the file in our k6 script somewhat like this:
const assets = new SharedArray('assets', () => {
const assetFile = open('<path-to-file>');
return papaparse.parse(assetFile, { header: true }).data;
})
Yes, if you are using the k6-operator, the volumeClaim
allows you to mount the script file and any other file in the volume is also available for the script. Your code should work.
Here is a piece of code of our scripts which is working:
const QUERIES = new SharedArray("queries", function () {
console.info(
"Getting queries from path: " + ROOT_PATH + LOCAL_PATH + '/queries.json');
return JSON.parse(open(ROOT_PATH + LOCAL_PATH + '/queries.json'));
})[0];
We are opening in that way several files with more than 2MB each so I think it will work for bigger files.
Thank you, this was very helpful!
Issues like https://github.com/grafana/k6/issues/2911 and https://github.com/grafana/k6/issues/2962 have been making me think recently if we can't make a SharedArray
improvement that might partially solve this issue for 80+% of users :thinking: What if we allow networked code in the SharedArray
constructor callback? That is, have something like this:
import { SharedArray } from 'k6/data';
// no network code allowed here
const data = new SharedArray('some name', function () {
// we allow networked code here
let resp = http.get("https://some.url/that-returns-a-ton-of-data")
return resp.json();
});
// no network code allowed here
Implementing this will probably be quite tricky and might require some refactoring of how we initialize the JS runtimes, but it seems doable :thinking: For example, it's probably fine to drop any metrics generated from these network calls, it might even be desirable to do so... On the other hand, the script options
would not have been finalized yet, so some APIs might behave strangely... :disappointed: It's very far from ideal from an UX perspective, but if it can be done, it should significantly lessen the need to support SharedArray
in setup()
or the VU context :thinking:
There is also the problem of distributed execution and .tar archives. If we allow networking code in the SharedArray
callbacks, are we going to execute it on every instance that runs the .tar archive, or are we going to store the result in the .tar file? :thinking: We get around this issue right now by saving the open()
-ed files in the init context in the .tar archive, so all instances are able to reconstruct the final SharedArray value identically. But this might not always be the desired behavior :thinking:
On a somewhat related note, the proof-of-concept architecture I did in https://github.com/grafana/k6/pull/2816 for distributed execution (https://github.com/grafana/k6/issues/140) and test suites (https://github.com/grafana/k6/issues/1342) would also allow us to relatively easily have a SharedArray
constructed during setup()
or VU code, the original suggestion proposed in the OP of this issue. I didn't make a PoC for that, but it should be possible to make one quickly... :thinking: One nice thing about this approach will be the fact that the SharedArray
constructor will be run only on a single machine, regardless of whether it was in the init context, VU context or even inside setup()
...
It will to that by basically making setup()
not that special, i.e. by adding a mechanism for shared blobs of binary data that any one instance can create and consume: https://github.com/grafana/k6/blob/49a2e27972a95f0ab18a4fb10fc406a36a75fa07/execution/controller.go#L5-L7
That is used for the setup()
function in that branch to work in a distributed test run, but it can be used equally easily for SharedArray
with a bit of simple optimization: https://github.com/grafana/k6/blob/49a2e27972a95f0ab18a4fb10fc406a36a75fa07/execution/scheduler.go#L443-L457
Still, while I think that approach is more internally consistent and easier to reason about, both approaches are not necessarily mutually exclusive, with a bit of work and thought :thinking: If allowing network requests in the init context SharedArray
callbacks is easy, maybe we can first do that. We can either save their results in the .tar archives, or maybe we can have a PerInstanceSharedArray
whose explicitly gets executed once on every instance :thinking: Or, we can "canonize" that the SharedArray
callback is called on every instance (since that's what currently happens) and we can add a new GlobalSharedArray
that doesn't do that, if we want to, in the future... :man_shrugging: There are a lot of subtle issues and potential solutions that we have to consider before we make anything non-experimental... :scream:
So, yeah, to conclude this somewhat rambling stream of thoughts, more proofs-of-concept are needed here :sweat_smile: I am probably missing quite a lot of critical details, e.g. it might be completely impossible to allow init network calls only in SharedArray
callbacks without rewriting half of k6... :sweat_smile:
initialize SharedArray in non init context
There is nothing preventing us from just letting SharedArray be created outside of the init context, I think it was previous not technically possible as it was using a different mechanism to share the data, but this is no longer a problem.
The only caveat is that it should probably disallow creating SharedArray from
setup
,teardown
andhandleSummary
as ... well the last 2 aren't all that useful and the first one will not work in a distributed setup (unless we decided that SharedArray should be shared between distributed instances, but I think that will be technically ... very hard :tm: )SharedArray should be returnable from
setup
This will need some kind of special finding out that whatever was returned is SharedArray first, and then a way to save this in the JSON and reproduce the array on the other side.
This gets even more complicated due to the fact that you might want to return an array of SharedArrays ... which will mean that this might need to support on many levels :sob: , we might decide to not support this at first.
Because of the caveat mentioned above, it probably will need some ... specific code ?!?
Alternative: Maybe as suggested in this comment we should just make another breaking change and make the returned setup data always a "SharedObject" or something like that. I would expect that depending on how deep it is "shared" it will either not save memory, or be slower at least to some degree
Use cases:
When a lot of data will be generated in
setup
(for example because of needing to use HTTP request to get the data) and then it will be nice to not get this data copied for each VU but instead have it as a single copy.Making the SharedArray makeable in the
default
function also can help that as a single VU only will call the initialize code ... once. The problem is thatsetup
might still need to do some setupping and there is no way to return this data toteardown
if it's needed there.edit: This is likely going to wait for #140 in order to be doable.