Closed sherrman closed 6 years ago
Thanks for reporting this and creating the test case! I'm currently investigating what's happening and my current hypothesis is that it's something UTF related, since when we're not using binary the uploaded file is exactly twice the size of the original. Potentially something goja (the JS runtime) does, since IIRC JS uses UTF-16 internally.
Here's some sample code that's not connected with k6 that I think demonstrates the issue:
package main
import (
"fmt"
"github.com/davecgh/go-spew/spew"
"github.com/dop251/goja"
)
func dump(info string, thing interface{}) {
fmt.Printf("%s: %s\n", info, spew.Sdump(thing))
}
func main() {
b := []byte{'a', 'b', 'c', '\xe6', '\x97', '\xa5', '\x00', '\xee', '\xff'}
dump("Original byte slice", b)
s := string(b)
dump("Byte slice converted to string", s)
dump("String converted back to byte slice", []byte(b))
rt := goja.New()
rt.Set("testBytes", b)
rt.Set("testStr", s)
rtBytes := rt.Get("testBytes")
rtString := rt.Get("testStr")
dump("Bytes object from runtime", rtBytes)
dump("Bytes export from runtime", rtBytes.Export())
dump("Bytes export from runtime converted to string from go", string(rtBytes.Export().([]byte)))
dump("Bytes to string from runtime", rtBytes.String())
dump("String from runtime", rtString)
dump("String export from runtime", rtString.Export())
dump("String to string from runtime", rtString.String())
}
It would output something like this:
Original byte slice: ([]uint8) (len=9 cap=9) {
00000000 61 62 63 e6 97 a5 00 ee ff |abc......|
}
Byte slice converted to string: (string) (len=9) "abc日\x00\xee\xff"
String converted back to byte slice: ([]uint8) (len=9 cap=9) {
00000000 61 62 63 e6 97 a5 00 ee ff |abc......|
}
Bytes object from runtime: (*goja.Object)(0xc420140480)(97,98,99,230,151,165,0,238,255)
Bytes export from runtime: ([]uint8) (len=9 cap=9) {
00000000 61 62 63 e6 97 a5 00 ee ff |abc......|
}
Bytes export from runtime converted to string from go: (string) (len=9) "abc日\x00\xee\xff"
Bytes to string from runtime: (string) (len=30) "97,98,99,230,151,165,0,238,255"
String from runtime: (goja.unicodeString) (len=7 cap=7) abc日��
String export from runtime: (string) (len=13) "abc日\x00��"
String to string from runtime: (string) (len=13) "abc日\x00��"
Something strange is going on with goja's utf16 strings. I don't understand why the String()
method (last line of the output) returns a mangled (len=13
) copy of the data, given that it supposedly decodes it to a go string...
Ah, the issue probably originates when we convert binary (i.e. invalid utf8) "stings" to utf16, not sure we can actually get the original binary value back...
I spent some more time investigating this and there are some bad news and some good news...
The bad news is that there's no current workaround for the bug, basically once a binary response is treated as a utf-16 string, the information is lost. As an example, if I have create a "string" with all possible byte values:
b := make([]byte, 256)
for i := 0; i <= 255; i++ {
b[i] = byte(i)
}
runtime := goja.New()
runtime.Set("testStr", string(b))
and I pass it to a goja runtime, where I dump the data in it by running a script like this:
for (var i = 0; i < testStr.length; i++) {
buffer.push(testStr.charCodeAt(i));
}
I get the values from 0 to 127 fine, but the rest are encoded as 65533, which is basically a replacement character that substitutes for corrupt UTF sequences... So no recovering the original data from that...
But the good news is that I think we can solve the issue with a relatively simple fix. We can add a new HTTP request responseType
option that specifies how to receive the response body, similar to the browser XMLHttpRequest.responseType
option. By default it would be text
, but if users supply binary
, we'll return the original Go []byte
value. Of course we'd have to change the HTTPResponse.Body
type from string
to interface{}
, but there shouldn't be any issues with that.
There may even be some benefits, for example we can return null
when users have enabled the new discardResponseBody
option (we use an empty string now, which could be somewhat confusing). We might even rename the new discardResponseBody
option to instead be responseType=none
, so we don't have 2 separate options.
That's awesome. I appreciate you looking into this quickly.
I like that solution. 👍
Using a binary response body from an http.get in a new request body doesn't work. The new request body is corrupted.
In the above example the
compare
endpoint won't receive the same binary data as was returned by thebinary
endpoint. I've created a sample Go http server that serves files and has acompare
endpoint. The readme shows the validation of the Go server along with the Python SimpleHTTPServer.The k6 file also reads a binary file from disk and demonstrates that works against the
compare
endpoint.https://github.com/sherrman/k6_binarybug