Open KOConchobhair opened 1 month ago
Hi @KOConchobhair,
Thanks for raising this topic 🙇🏻 First, let me ask you, as a library user, do you have any preference?
From my side, I don't have a clear path in mind, but some of the thoughts from the top of my mind are:
S3.getObject
specifically, I think that in general is better to assume that the contents are binary (use ArrayBuffer
?), because no matter what kind of content is more common, I think it's generally easier to convert the body into a string later, if needed, than the opposite.
ArrayBuffer
(which doesn't sound really bad, except for what I mentioned above), I guess we'll need to add a new optional argument to the S3.getObject
function.
That said, let me invoke @oleiade, because as the main contributor of the project, he might have some thoughts to share.
hmm, it seems that the problem is worse than simply passing the responseType
into the k6 http request. Because this doesn't even work:
const res = await http.asyncRequest(method, signedRequest.url, null, {
headers: signedRequest.headers,
responseType: 'binary',
})
console.log(res)
it always returns {}
for res.body
console.log(res)
--> https://pastebin.com/raw/S3mZR9CH
The Typescript types for the RefinedResponseBody
seem to want to treat the data as bytes
in this case
https://github.com/DefinitelyTyped/DefinitelyTyped/blob/c58e3459ea3d434ca1e7bf3a0e1ffcb4a95d363b/types/k6/http.d.ts#L631
But it seems the k6 http module is returning an ArrayBuffer
https://github.com/grafana/k6/blob/73aaa790da2c66d9f3bbc3fd059246fc6ad5b63d/js/modules/k6/http/request.go#L134
I don't know if this is the root cause but it appears only responseType: 'text'
is functional atm. I also confirmed that there's no issue with localstack
and downloading an image file via the S3 API (using aws
cli) so the problem appears to be in k6. Let me know if there is another repo to escalate this issue to.
okay, i think I see what's going on here. Even with the types fixed (I got a PR for that going here: https://github.com/DefinitelyTyped/DefinitelyTyped/pull/69781) the k6 script itself is javascript so its a little confusing how to access the value for res.body
. For example, res.body.byteLength
works but console.log(res.body)
always converts to empty object {}
. Maybe thats expected for javascript/ArrayBuffer, I'm not sure. But the good news is the k6 http module is working fine for binary files! So I will proceed with a PR to address the originally reported issue with the following recommended approach (which should also help with #45 as well)
add a new optional argument to the S3.getObject function.
Hey @KOConchobhair,
Thanks for spending time on this, and for bringing all the details! 🙇🏻 Really appreciated!
For example, res.body.byteLength works but console.log(res.body) always converts to empty object {}.
I guess this might be related with 👉🏻 https://github.com/grafana/k6/issues/3677#issuecomment-2042149423
In any case, regarding your contributions:
release-v0.52.0
), as that's what we use as the base for the next release (and what will be merged against upstream), please?Thanks! 🙏🏻
- Really looking forward for that PR to address the reported issue.
Okay, a PR for this issue is ready for review: https://github.com/grafana/k6-jslib-aws/pull/107 (I also think it mostly resolves #45 as well)
I've got a second PR for general cleanup/unit test fix ready for review as well: https://github.com/grafana/k6-jslib-aws/pull/106
- Could you please open [k6] Fix
ResponseBody
type to useArrayBuffer
instead ofbytes
DefinitelyTyped/DefinitelyTyped#69781 against https://github.com/grafana/k6-DefinitelyTyped (branchrelease-v0.52.0
), as that's what we use as the base for the next release (and what will be merged against upstream), please?
Regarding the types, yeah I suppose I could move the changes there. Although while I don't claim to understand the release process, this fix seems like a bug to me in the currently released v0.51.0 version of k6. Why would we have to wait until v0.52.0 to fix an existing issue with the types? It would seem DefinitelyTyped can handle an update prior to to the next version.
For context, I believe this is connected to #45; for which I unfortunately didn't find a satisfying solution at the time.
Regarding the type definitions, we try to keep changes aligned there with the k6 releases indeed. However, in this specific case, I believe we could probably go ahead and amend the k6 v0.51 definitions directly.
Hey folks 👋🏻
First and foremost, thanks for raising this and proactively contributing to the resolution @KOConchobhair 🙇🏻
I've been considering our options there, and although k6 initially adopted this approach in its HTTP
module, and it made sense there, I'd rather avoid adding a responseType
parameter to getObject.
Although we don't support most of them, the S3 getObject
action actually supports many pre-existing request parameters, and I'd rather leverage those instead.
Thus, my preference would be to add an additionalHeaders
(or any better name) argument to s3.getObject
, which would be added to the underlying request. Using this, we could integrate the logic in the method so that if, say, the additional headers contain an Accept
header set to application/octet-stream
, we then treat the response as binary and return an ArrayBuffer or bytes
.
That way, we can also start supporting more of the underlying API call request parameters as we need them in the future.
What do you folks think?
For reference, I've tried implementing it on the side to test the idea, and I quite like the look and feel...
async getObject(bucketName: string, objectKey: string, additionalHeaders = {}): Promise<S3Object> {
// Prepare request
const method = 'GET'
const signedRequest = this.signature.sign(
{
method: method,
endpoint: this.endpoint,
path: encodeURI(`/${bucketName}/${objectKey}`),
headers: {
...additionalHeaders
},
},
{}
)
let responseType: ResponseType = 'text'
if ('Accept' in additionalHeaders &&
additionalHeaders['Accept'] !== undefined &&
additionalHeaders['Accept'] === "application/octet-stream") {
responseType = "binary"
}
const res = await http.asyncRequest(method, signedRequest.url, null, {
headers: signedRequest.headers,
responseType: responseType as ResponseType,
})
this._handle_error('GetObject', res)
return new S3Object(
objectKey,
Date.parse(res.headers['Last-Modified']),
res.headers['ETag'],
parseInt(res.headers['Content-Length']),
// The X-Amz-Storage-Class header is only set if the storage class is
// not the default 'STANDARD' one.
(res.headers['X-Amz-Storage-Class'] ?? 'STANDARD') as StorageClass,
res.body
)
}
// Let's redownload it, verify it's correct, and delete it
const downloadedFileContent = await s3.getObject(
testBucketName,
testFileKey,
{ 'Accept': 'application/octet-stream' })
console.log(`downloadedFileContent.byteLength: ${downloadedFileContent.data.byteLength}`)
console.log(downloadedFileContent.data instanceof ArrayBuffer)
if (downloadedFileContent.byteLength !== testFileContent.byteLength) {
exec.test.abort("Downloaded file size does not match the original file size")
}
On top of #45, the
S3Client
cannot be used to download a binary file from S3!as I'm sure you are aware, if you have
discardResponseBodies=false
there’s aParams.responseType
that you can pass to a call to k6'shttp
likeand that controls whether
res.body
is astring
or anArrayBuffer
.however since the
S3Client
code does not pass in aresponseType
in this code https://github.com/grafana/k6-jslib-aws/blob/f6b5dce3efe25d981836ae0ad8038f3866323d23/src/internal/s3.ts#L185-L187 then that means it defaults toresponseType: 'text'
and that meansWe need a way to either pass thru or hard-code a
responseType: 'binary'
for the k6http
client call ingetObject
. Do you have a preference on how that might be implemented?