Closed na-- closed 3 years ago
A user found out that the current websocket API lacks support for binary messages: https://community.k6.io/t/converting-audio-file-into-bytes-and-send-over-web-socket/272/
But in order to fix the websocket code, it would be best if we first standardized how we support binary data in general, i.e. this issue... So, because of that and all of the other dependent issues, I'm upping the priority of this issue.
I took a brief look at this, and the current ArrayBuffer
implementation comes from core.js, as do DataView
and all typed arrays. The Goja implementation is actually disabled, and doesn't seem to have ever been used. (You can now confirm this easily with --compatibility-mode base
. ;)
As suggested in #420 and dop251/goja#51, the way forward seems to be having native support in Goja, which would mean resurrecting the currently unused ArrayBuffer
implementation, and ensuring it works transparently with core.js polyfills.
It doesn't seem like a gargantuan amount of work, but I'm a bit out of my depths here, as Goja internals are quite complex. I can give this a shot if you agree with the approach, let me know.
I found this issue after needing to post some binary data. I ended up making a change to k6 that would allow me to use Uint8Array in the js code but it's a little hacky. Just wanted to share here in case anyone else needs it or it sparks any other ideas. Basically, I just look at the body parameter and if it quacks like an Uint8Array, I call the goja Get
func to get all the bytes and put them in a byte[]
.
diff --git a/js/modules/k6/http/request.go b/js/modules/k6/http/request.go
index d448c3a0..07fec21f 100644
--- a/js/modules/k6/http/request.go
+++ b/js/modules/k6/http/request.go
@@ -29,6 +29,7 @@ import (
"net/textproto"
"net/url"
"reflect"
+ "strconv"
"strings"
"sync"
"time"
@@ -97,7 +98,24 @@ func (h *HTTP) Request(ctx context.Context, method string, url goja.Value, args
var params goja.Value
if len(args) > 0 {
- body = args[0].Export()
+ // check to see if body param looks like a Uint8Array and if so, copy bytes out of it into a []byte.
+ if obj, ok := args[0].(*goja.Object); ok {
+ bytesPerElementVal := obj.Get("BYTES_PER_ELEMENT")
+ byteLengthVal := obj.Get("byteLength")
+ if bytesPerElementVal != nil &&
+ bytesPerElementVal.ToInteger() == 1 &&
+ byteLengthVal != nil {
+ byteLength := byteLengthVal.ToInteger()
+ byteBuf := make([]byte, byteLength)
+ for i := int64(0); i < byteLength; i++ {
+ byteBuf[i] = (byte)(obj.Get(strconv.FormatInt(i, 10)).ToInteger() & 0xff)
+ }
+ body = byteBuf
+ }
+ }
+ if body == nil {
+ body = args[0].Export()
+ }
}
if len(args) > 1 {
params = args[1]
With this change, you can use like:
const binaryBuf = new Uint8Array(4);
binaryBuf[0] = 0;
binaryBuf[1] = 1;
binaryBuf[2] = 2;
binaryBuf[3] = 3;
http.post(
someURL,
binaryBuf,
);
@imiric Did you have any luck with the goja ArrayBuffer implementation? Is this still the preferred strategy vs something like I did above? Need any help?
@caseylucas No, I didn't make much progress with exposing the native ArrayBuffer. Our top priority right now is getting #1007 merged, so while this issue is high on the priority list, it's unlikely to be worked on for a few weeks at least.
Your approach looks interesting, though I'm curious about the performance, and it should probably be done outside of HTTP.Request
, as k6 would benefit from a more generic and lower-level implementation. Pull requests are always welcome, so if you think this approach could work for the other issues mentioned here (open()
, crypto
, etc.), feel free to contribute. :)
Is there a workaround that can be used currently, even if it is inefficient? I am using protobufjs
which encodes protobuf messages as a Uint8Array
. When I try to pass this as a request body I get this error. It seems that the golang http
code has support for byte[]
, but I can't figure out how to pass that from the javascript side.
ERRO[0000] GoError: unknown request body type []interface {}
at native
Prompted by this forum topic, it might not be a bad idea if we also support "generator functions" when we have binary data. So, instead of http.post(url, someBigChunkOfData)
, users should be able to use http.post(url, someJSFunctionThatGeneratesDataOnDemand)
...
Things that I found out trying to use TypedArrays to fix #1571 / #1382 :
http.request
and are instead exported as an empty object(map[string]interface{}). Given the current goja exported types this will probably require some hacks such as the one in https://github.com/loadimpact/k6/issues/1020#issuecomment-558268847 to find out we have TypedArray and get it's buffer
which is ArrayBuffer.GoError: unknown request body type []interface {}
, so even tha hack won't work :(. IMO we should concentrate on fixing 3
+ 2
for v0.30.0(this will probably take more time testing than actually writing) and possibly 1
for v0.31.0.
4
IMO should be rewritten and have both readonly and not readonly response (the readonly should not copy that for each VU) and should probably return ArrayBuffer.
This might also need changes in goja or at least more APIs so I would argue on any kind of issues (detecting typedArrays) should probably be reported upstream as the maintainer has always been willing to help if not with actual code changes, then with at least bertter documentation of aspects that they know.
Why do we need to fix 2? :confused: I don't think we need to have a way to recognize and work with a TypedArray
, i.e. Int8Array
, Uint8Array
, Uint8ClampedArray
, Int16Array
, Uint16Array
, Int32Array
, Uint32Array
, Float32Array
, Float64Array
, BigInt64Array
, BigUint64Array
. These are all different means to work with the underlying binary data, and I think k6 should only care about that binary data, i.e. the underlying ArrayBuffer
that users can easily reference with TypedArray.prototype.buffer
.
So, I think everything in k6 that work with binary data should just recognize 3 things - string
, Go []byte
(for backwards compatibility) and ArrayBuffer
, though the HTTP request body handling will probably have to also keep some of the current magic for URL-encoding data as well... Everything else should probably throw an exception - we shouldn't try to guess something is a typed array or some other obscure data structure.
And regarding the extension of such arrays - I may be missing something, but I don't see a reason why we should do anything special in k6? Why doesn't this suffice: https://stackoverflow.com/questions/18600895/resize-arraybuffer
The problem with 2
is that
var s = new Uint8Array(20);
// do something
http.post(url, s);
will just ... not work .. more accurately will send empty body (currently). In order for it to work you need to
s.buffer
without any feedback, possibly just because you read the documentation that you need to that , but it won't actually tell you it isn't doing what you intented. .buffer
. In order for there to be any kind of feedback from k6 we will likely need the same thing so I am for just for making it work without additional need for documentation and specific exception "you should call .buffer on this typedArray instead of giving us the typeArray" ...And regarding the extension of such arrays - I may be missing something, but I don't see a reason why we should do anything special in k6? Why doesn't this suffice: https://stackoverflow.com/questions/18600895/resize-arraybuffer
That works ... but it means that if you want to make a request with form-data out of 10 files (for example) you will copy HELL of a lot of bytes around each time ... For no good reason. Also again I propose that whatever solution we have for this ... is after we fix 3 and 2 as, without those, that totally doesn't matter.
Also as I have mentioned (at least internally) k6 currently does an unreasonable amount of coping when it uploads files, which is something that probably should looked into around implementing any kind of Buffer like structure that won't just copy bytes around while it's being construcuted.
- use
s.buffer
without any feedback, possibly just because you read the documentation that you need to that , but it won't actually tell you it isn't doing what you intented.
I don't see the problem here - the error message can be that only string
and ArrayBuffer
types are accepted as the request body, and the error message code can lead to a docs page that has examples how various typed arrays can be used.
That works ... but it means that if you want to make a request with form-data out of 10 files (for example) you will copy HELL of a lot of bytes around each time ... For no good reason. Also again I propose that whatever solution we have for this ... is after we fix 3 and 2 as, without those, that totally doesn't matter.
I may be missing something, but even if you extend a Go slice, you still have to copy the actual old slice data, so I'm not sure how that's different :confused: And, again, at this point I don't see a need to implement 2 (recognition and direct support of TypedArray
).
Also as I have mentioned (at least internally) k6 currently does an unreasonable amount of coping when it uploads files, which is something that probably should looked into around implementing any kind of Buffer like structure that won't just copy bytes around while it's being construcuted.
This I agree has to be diagnosed and fixed.
Here's a request for supporting ArrayBuffers in the encoding
built-in k6 module: https://community.k6.io/t/base64-encode-arraybuffer/1152
I don't think I will have time for anything, but the most MVP of PRs for v0.30.0 ... and to be honest there is probably more work on documenting and testing it than anything else.
As of 41275ab5 k6 now accepts ArrayBuffer
as argument to all functions that previously accepted []byte
, in k6/crypto
, k6/encoding
, and k6/http
modules.
The leftover parts of this issue are the potentially backwards incompatible changes we should discuss:
returning ArrayBuffer
from open()
. This was done without a breaking change in #1800 by adding a new 'ab'
mode to open()
, but it was reverted during review.
returning ArrayBuffer
in HTTP response bodies. This was also done without a breaking change in #1800 by adding a new responseType: 'arrayBuffer'
option, and was also reverted.
returning ArrayBuffer
from crypto.randomBytes()
. This was a breaking change in #1800, and was reverted.
returning ArrayBuffer
from encoding.b64decode()
. This wasn't implemented in #1800, but we mentioned the possibility of making it a breaking change.
We should make a decision on whether we want to make breaking changes to support these, or if it would be better to add ArrayBuffer
support with a new argument where possible.
The open()
and binary HTTP response bodies are the two most common usages of []byte
, and I reckon that the vast majority of users won't notice a difference if they're just passing the value to e.g. http.post()
or crypto.sha256()
. They'll run into issues if they were accessing or modifying the value by index, in which case they'll have to do so via Typed Arrays instead.
For crypto.randomBytes()
I don't think there's a backwards path we can take, save for forcing a new optional argument that we'd deprecate soon anyway. But it's likely much less widely used, and unless it was being accessed by index it's the same situation as above.
We could make the encoding
change backwards compatible with a new argument, since it's likely that the current string-only usage was sufficient for most users, even though it should be possible to return both an encoded and decoded ArrayBuffer
value.
So maybe the API should be:
encoding.b64encode(input, [encoding], [format])
encoding.b64decode(input, [encoding], [format])
Where format
could be 'b'
to return ArrayBuffer
or empty/unspecified to return string as it does now, to keep it consistent with the open()
argument.
So tldr; my vote is for making a breaking change for open()
, binary HTTP response bodies and crypto.randomBytes()
, and a non-breaking change for encoding
.
WYT?
So tldr; my vote is for making a breaking change for
open()
, binary HTTP response bodies andcrypto.randomBytes()
, and a non-breaking change forencoding
.
:+1: I think I share your preferences. I also implore anyone in the k6 community who is watching the issue to comment if they agree or disagree, since it's looking likely that we'll make a bunch of minor breaking changes in k6 v0.31.0.
Actually, do we need a new format
parameter for encoding.b64encode()
? Isn't the idea of base64 to transform potentially binary data to a safe string? So we can just type assert the input
for ArrayBuffer
or string
, but we probably should always return a string
.
But yeah, encoding.b64decode()
probably should have a format
parameter, so users can specify whether they want string
or ArrayBuffer
as the result. Though with string
being the default, I imagine some users will probably get mangled binary data, not realizing that the conversion of binary data to JS's UTF-16 may be lossy... Always returning ArrayBuffer
will be somewhat bad UX though, despite the manual conversion to string
being a one-liner: String.fromCharCode.apply(null, new Uint16Array(buf))
:man_shrugging:
Where format could be 'b' to return ArrayBuffer or empty/unspecified to return string as it does now, to keep it consistent with the open() argument.
I don't think it makes sense for b64decode
to by default return a string
. In general, base64 is used in order to transport binary data as a string, so I am for it returning an ArrayBuffer by default. I am fine with it having an option to return a string or to have the one-liner documented well.
I would expect a lot more users (will) have the problem of mangled decoded binary as string than the need to do anything with what is decoded but to either send it or compare it.
Actually, do we need a new format parameter for
encoding.b64encode()
? Isn't the idea of base64 to transform potentially binary data to a safe string?
That's the main use case, yes, but the main Encode()
function in encoding/base64
works with []byte
and EncodeToString()
is just a wrapper around it. So presumably someone would want to work with the binary encoded data and stringify it when done, though that's probably a very rare use case. So I'm fine with not introducing format
to b64encode()
, though the discrepancy of adding it to b64decode()
bothers me a bit. :smile:
I agree that the default return value from b64decode()
should be binary, but breaking that might impact more users than open(..., 'b')
, since they could be relying on string
to concatenate the result elsewhere, or do other string manipulation, which wouldn't be common if the script was previously expecting []byte
and now it receives ArrayBuffer
.
But I can go ahead with the change and we can decide before merging the PR.
We currently don't have a good way to deal with binary data on the JS side yet :disappointed:. We rely on
[]byte
from the Go code, which translates to int arrays in JS, which is far from optimal. These are the places where we currently have binary data that I can think of:open("somefile", "b")
{ responseType: "binary" }
as paramscrypto.randomBytes()
(https://github.com/loadimpact/k6/pull/922)crypto
hashing and digest functions (https://github.com/loadimpact/k6/pull/952)We need a native analogue of the node.js
Buffer
, or better yet - to properly support ESArrayBuffer
objects. We actually haveArrayBuffer
, the code below compiles and runs as you'd expect:but I'm not sure if this
ArrayBuffer
comes from the bundled core.js or from goja. It seems like theArrayBuffer
is from goja (though I wouldn't rely on it very much), but theDataView
andInt32Array
seem to be from core.js :confused:So we need to investigate exactly what the situation is and what we can use to handle binary data sensibly...
Related issues: https://github.com/loadimpact/k6/issues/856, https://github.com/loadimpact/k6/issues/874, https://github.com/loadimpact/k6/issues/873