tus / tus-js-client

A pure JavaScript client for the tus resumable upload protocol
https://tus.io/
MIT License
2.12k stars 316 forks source link

Base64 Encoding of Metadata Assumes Metadata Value is a UTF8 String (Should It?) #651

Closed jimydavis closed 11 months ago

jimydavis commented 1 year ago

https://github.com/tus/tus-js-client/blob/eed7b9aa6e21be3b7ee44ab8f0e48ca36bf565ec/lib/upload.js#L920

We are trying to hack the metadata to pass in a SHA256 hash which is a byte array. But we can't do that if the base library is assuming it is a utf-8 string. We can't convert a SHA256 hash byte array into a utf-8 string as not all bytes will correspond to a utf-8 character.

Is there a way to pass in a Uint8Array instead? We are using Uppy which I assume uses the tus-js-client.

Acconut commented 1 year ago

The protocol itself does not specify any encoding of the underlying metadata values (https://tus.io/protocols/resumable-upload#upload-metadata), but you are right that tus-js-client implicitly converts the value to a string first. The used base64 library seems to support Uint8Array (https://www.npmjs.com/package/js-base64#synopsis), although with a different function than for strings. So we could add support for Uint8Array to tus-js-client. Would you be willing to open a PR for this?

Alternatively, you can always encode the binary SHA256 into hexadecimal before passing it to tus-js-client.

jimydavis commented 1 year ago

Thanks @Acconut - our backend have to to convert it back to hex first inside the tusd hook. We will try to submit a PR for this if we can!

Acconut commented 1 year ago

Great, let me know if you need help!

ADTC commented 1 year ago

@Acconut Thanks for the response.

This isn't really necessary to change. Instead we can encode the binary SHA256 to a binary string, then convert that to a base64 string, then pass it in as metadata. Uppy encodes this string to a base64 meta-string during upload, and Tus backend decodes the base64 meta-string back to the original base64 string, which we can then consume in the backend, simply by decoding it back to binary SHA256 (or however we need to use it).

Diagrammatically:

Client:
 -> (We) Generate ByteArray, convert to BinaryString
 -> (We) Base64 of BinaryString as String1, set in Metadata
 -> (Uppy) Base64 of String1 as String2, set in Metadata (Header)
 -> (Uppy) Upload

Server:
 -> (Tus) Receive String2 in Metadata (Header)
 -> (Tus) Decode String2 to String1
 -> (We) Use String1 or Decode String1 to BinaryString, then to ByteArray

Or we can convert the byte array to hexadecimal, send that up as base64, then convert it back to byte array. 🙂

Acconut commented 1 year ago

Yes, converting it to a string before passing it to tus-js-client is also always an options if that works with your backend. 👍

Acconut commented 11 months ago

I think this can be closed now as your question was answered. Please let me know if this is not the case.