Closed scottf closed 2 years ago
@derekcollison The requirement to generate a digest is problematic. In some languages, the entire data must be available for the digest to be calculated (Go can do it without buffering the entire contents, but for example none of the web crypto APIs work that way). This would mean the digest shouldn't be required.
Digest inclusion should be an option and calculated by the writer if desired. Also the decoration of the digest algorithm on the hash should instead be a field in the ObjectInfo, with the hash value being its base64 URL encoding. Alternatively, each of the chunks could have a digest as a header entry, presumably a client that can read a message chunk will have the data in memory for the chunk while handing it off to the application.
I think that functionality is pretty important.
This not a solution in the TS/JS world?
While node may have a solution, things like browsers won't. So if it is a requirement, the client will have to implement a steamed version. Streaming the data is fine, but if the object is large non steamed crypto operations will spike or oom the process.
https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest
Closing. See ADR PR https://github.com/nats-io/nats-architecture-and-design/pull/66
Discussion moved to https://docs.google.com/document/d/13RF06NCzRPBOW_es1pKUqQyThMBlGGUz170Y1xuls8Q/edit#
Overview
Working up to an ADR for object store.
Feature Requests
Stream Conventions
Stream Name
???
Chunk Subject Name
$O.%s
where%s
is a subject meaningful to the file / blob, maybe the file/blob id (see meta data)Meta Data
File / Blob Meta Data
File / Blob meta data in json form or as message headers
Options being discussed for where to store this:
Nothing prevents this data from being stored elsewhere.
Chunk Meta Data
Chunk meta should be included as headers on each chunk.
Other Considerations
Pre-chunked data
Consider streamed video is already broken up into individual chunks which can be retrieved in a random access fashion. A similar storage mechanism can be used, but there needs to be a way to know what each specific record (message) is. There might be an index piece of data that stores the timestamp of the chunk along with it's sequence number. Alternatively you could extend a subject by
$O.<subject>.<chunkIdentifier>
giving the ability to subscribe specifically to that chunk. Don't know if this is efficient i.e. to have that many subjects or it's just better to deal with the sequence. Either has tradeoffs when using / creating subscriptions / consumers to retrieve the specific part.