Summary / Operation Compression Discussion

milanro commented 2 years ago

@justus-camp @DLehenbauer @dstanesc This is the place where we can discuss details of compression implementation of the operations and summaries in the fluid runtime (rather than on the DDS level). The compression executor selection and implementation might theoretically be common to both compression of summaries and operations so we can discuss here the requirements and design.

The common points are especially

which compression algorithm(s) will we use
is the compression optional? (how do we enable / disable compression)
compression executor API

Beside common points, we can use this discussion for compression design of both summaries and operations.

milanro commented 2 years ago

The summary data is organized as the ISummaryTree which is the tree of ISummaryObjects where ISummaryObject can be ISummaryTree, ISummaryBlob ( or ISummaryHandle or ISummaryAttachment objects). ISummaryBlob contains payload data and is the main candidate for the compression. This could be implemented on the container-runtime level already. This way we could not only reduce the transfer size but also the storage size.

However we might also want to compress the complete summary to get the most from the compression algorithm. I suppose that it would require changes at the driver level, we also need to check, in which format the historian service or alternative service accept the summary (can we simply zip everything where the blobs distribution is lost).

@DLehenbauer @vladsud @dstanesc @justus-camp , what is your opinion?

dstanesc commented 2 years ago

However we might also want to compress the complete summary to get the most from the compression algorithm.

Please account that in the context of the larger goal incrementality is fundamental. To maintain content-defined chunking stability (ie efficiency), the partitioning needs to happen (to be demonstrated but fairly intuitive) prior to compression

milanro commented 2 years ago

However we might also want to compress the complete summary to get the most from the compression algorithm.

Please account that in the context of the larger goal incrementality is fundamental. To maintain content-defined chunking stability (ie efficiency), the partitioning needs to happen (to be demonstrated but fairly intuitive) prior to compression

Incrementality can still be kept if compression is used only for the transfer.

dstanesc commented 2 years ago

Incrementality can still be kept if compression is used only for the transfer.

Two level compression and logically decoupling transfer from storage compression is indeed a very interesting idea.

milanro commented 2 years ago

@DLehenbauer @dstanesc @justus-camp I am analyzing here the summary execution code.

Summary Generation Stack Analyze Method summarizeCore at the SharedObject is called through 2 stack paths. One of them generates the GC data and the other one generates the Summary. The partial stack is as follows :

GC Data

summarizeCore (propertyTree.js:381)
processGCDataCore (sharedObject.js:447)
getGCData (sharedObject.js:428)
...
collectGarbage (containerRuntime.js:1639)
summarize (containerRuntime.js:1520)
submitSummary (containerRuntime.js:1740)

Summary

summarizeCore (propertyTree.js:381)
summarize (sharedObject.js:412)
summarizeChannelAsync (channelContext.js:24)
summarizeInternal (remoteChannelContext.js:75)
...
summarize (summarizerNode.js:84)
summarize (summarizerNodeWithGc.js:96)
summarize (containerRuntime.js:1523)
await in summarize (async)
submitSummary (containerRuntime.js:1740)

The summary stack enriches the an re-structs the summary objects so that at summarize (containerRuntime.js:1523) it is already completely refactored objects with the rich metadata together with the original payload.

Question : Are these metadata needed in the further processing (further at the client or at the server), or can they also be compressed.

Question: Should the GC data be generated after compression or could we avoid compression for this generation

The idea about encrypting is to encrypt all blobs. Each blob would be encrypted separately. We have various possibilities of the encryption:

At summarize (sharedObject.js:412) and processGCDataCore (sharedObject.js:447), we would encrypt it twice, for both GC Data and Summarize process. This looks to be the safest possibility but also with highest performance impact.
At summarize (sharedObject.js:412) which would encrypt it only for summarizing, not for GC data. We need deeply understand how the GC Data are generated and whether this is acceptable.

The picture bellow shows the structure of the summary tree at SharedObject

SummaryTreeAtSharedObject

At summarize (containerRuntime.js:1523), all blobs (also metadata) are compressed

The picture bellow shows the structure of the summary tree at ContainerRuntime

SummaryTreeAtContainerRuntime

We can also discuss about further places for compressing summarization.

vladsud commented 2 years ago

My logic is based on how ODSP stack works. I think AFR/FRS stack works mostly the same, but even if it's not, it should be not that far away.

Compression whole summary payload (only at transport layer, i.e., same as http compression, with storage operating with decompressed payloads) would be preferred for these reasons:

better compression ratio
Most storages are going to reassemble snapshot from incremental summary, and store result as compressed payload in storage. I.e. compressing individual blobs, and then compressing resulting snapshot is a waste of resources.
- This assumes that snapshot fetching uses http protocol compression as well.

The reason it's hard to implement - almost no service likes to deal with compressed payloads (i.e. decompress) due to the chance of compression bombs.

So, while compressing individual blobs is not ideal, this operation can be done completely on runtime side, with zero knowledge on storage side. I'd still expose compression algorithm in the protocol, such that storage has a chance to decompress them before processing (if it chooses so).

Incrementality of summaries is orthogonal to all of that.

vladsud commented 2 years ago

As to your other questions, I believe (but we should validate that) general interface (IChannel) separates summary from getting GC routes. And SharedObjectCore does the same. Only SharedObject uses two-summary approach to get all GC data- this is done for convenience of most DDSs that are small (in size) do not want to care about writing more code to deal with GC data extraction, so they use this lazy (and not very efficient path). Anything complex (in terms of amount of data stored) should not use this path.

I do not fully understand your question about compression and encryption for this layer. If DDS compresses its blobs - it's a problem of DDS, nobody cares. As long as serializer is properly used when serializing data and all the handles are extracted from original data, GC sub-system does not care if blobs are compressed and/or encrypted.

That said, I'd rather see a system where

Pri1: Storage adapter (implementation of IDocumentStorageService) inserted in between ContainerRuntime and ContainerRuntime._stroage and it does compression for all the blobs going in summary, and decompression on reads.
Pri3: DDS have capability (if they want to) compress blobs individually with different algorithm, or mark blobs to be excluded from compression at above level.

This would require proper mark up of compression algorithm (similar to encoding) on interfaces up & down (I.e. ISummaryBlob, as well as some way to indicate compression on blobs going through IDocumentStorageService.readBlob, please note that not all blobs are summary blobs, there are "attachment" blobs that are written by ContainerRuntime.uploadBlob().

Either way, we should consider how to enable feature like that in existing applications. It either needs to be optional (i.e. when we get the blob, we do not automatically assume it's compressed, we figure out if it's compressed or not, as it might be written by old runtime version). Or we compress all blobs in a document and store this property (all blobs compressed or not) as part of file metadata (allowing old files to continue to be non-compressed, and new files starting from some point in time to be fully compressed).

vladsud commented 2 years ago

Heads up - I'm on vacation for 2 weeks. If you need help on runtime side, Daniel is your best contact. Justus is working on op compression and can provide update on where we are (it would be great to have your thoughts on how quickly you would need it - we are moving a bit slowly, as a bunch of refactoring is required).

milanro commented 2 years ago

I do not fully understand your question about compression and encryption for this layer. If DDS compresses its blobs - it's a problem of DDS, nobody cares.

I have no insight how GC data work. I wanted to understand, whether I can generate GC data using unencrypted blobs in the summary (at DDS) and then submit the summary with encrypted blobs (encrypted somewhere deeper at ContainerRuntime -> storage).

milanro commented 2 years ago

This would require proper mark up of compression algorithm (similar to encoding) on interfaces up & down (I.e. ISummaryBlob, as well as some way to indicate compression on blobs going through IDocumentStorageService.readBlob, please note that not all blobs are summary blobs, there are "attachment" blobs that are written by ContainerRuntime.uploadBlob().

@vladsud My idea was to compress only summary blobs coming from DDS and not touch any additional blobs. There are for example many metadata blobs, such as .electedSummarizer, .metadata and .attributes. It is not clear to me whether I can compress them as I am not sure where they are used. It also maybe has no sense to compress them as they are small. Of course, compression would be executed outside the DDS code, in the location common to all DDSes.

vladsud commented 2 years ago

You can definitely compress them, or skip - it does not matter that match. What matters is that when we read them, we know if we need to decompress them or not. There is one API to read any type of blob, so we either need to know (though some extra info) if blob is compressed in some shared code (that deals with all blobs), or push responsibility to decompress (and know if it needs to be decompressed) to final user of that blob.

milanro commented 2 years ago

@vladsud Thanks for pointing me to the proper location where the compression / encryption should be implemented. I assume that implementation of IDocumentStorageService should contain the code. So far, ContainerRuntime._storage is implementation of IDocumentStorageService (in case of my example it is ContainerStorageAdapter).

@vladsud @DLehenbauer @dstanesc If we want to put the code between ContainerRuntime and ContainerRuntime._storage, would it be a good approach to use the delegation pattern and create the wrapper implementation of the IDocumentStorageService where the methods would apply the compression / encryption and then redirect calls to the methods of the original storage instance?

The methods of IDocumentStorageService are as follows

getSnapshotTree(version?: IVersion, scenarioName?: string): Promise<ISnapshotTree | null>;

getVersions(
    versionId: string | null,
    count: number,
    scenarioName?: string,
    fetchSource?: FetchSource,
): Promise<IVersion[]>;

createBlob(file: ArrayBufferLike): Promise<ICreateBlobResponse>;

readBlob(id: string): Promise<ArrayBufferLike>;

uploadSummaryWithContext(summary: ISummaryTree, context: ISummaryContext): Promise<string>;

downloadSummary(handle: ISummaryHandle): Promise<ISummaryTree>;

I need to deep-dive to understand, how they are used. It looks like createBlob and readBlob methods are consistent, when I implement the compression in createBlob and decompression in readBlob, it will probably fit together . I am not sure about the methods uploadSummaryWithContext, downloadSummary and getSnapshotTree. At first sight the last two are not reverse methods to uploadSummaryWithContext which is used at ContainerRuntime.submitSummary. Please, let me know, if you have an insight here.

As @vladsud is pointing to, we need to have some sort of markup, which compression algorithm (or whether at all) is used. We have legacy blobs which have no markup at all. Unfortunately, when reading blob, we are getting ArrayBufferLike binary structure only so I do not see any clean and puristic solution if we do not change the complete protocol. We could prefix the binary string with some non-probable marker and if it exists at the very beginning, we would read the compression metadata from the prefix. Please, give me your opinion.

milanro commented 2 years ago

Unfortunately the method ContainerStorageAdapter.uploadSummaryWithContext does not use the method ContainerStorageAdapter.createBlob as shown in the stack below. This requires us to cover all cases of creating and reading blob. If this is to be done at IDocumentStorageService level, we need to cover all methods. I suppose that the creating and reading the blob is not called other way than via IDocumentStorageService instance at the ContainerRuntime but this must also be verified. I suggest implementing the POC and doing various tests at first.

createBlob (gitManager.js:102)
(anonymous) (retriableGitManager.js:36)
runWithRetry (runWithRetry.js:20)
runWithRetry (retriableGitManager.js:69)
createBlob (retriableGitManager.js:36)
writeSummaryBlob (summaryTreeUploadManager.js:65)
await in writeSummaryBlob (async)
writeSummaryTreeObject (summaryTreeUploadManager.js:39)
(anonymous) (summaryTreeUploadManager.js:24)
writeSummaryTreeCore (summaryTreeUploadManager.js:22)
writeSummaryTreeObject (summaryTreeUploadManager.js:48)
(anonymous) (summaryTreeUploadManager.js:24)
writeSummaryTreeCore (summaryTreeUploadManager.js:22)
writeSummaryTreeObject (summaryTreeUploadManager.js:48)
(anonymous) (summaryTreeUploadManager.js:24)
writeSummaryTreeCore (summaryTreeUploadManager.js:22)
writeSummaryTreeObject (summaryTreeUploadManager.js:48)
(anonymous) (summaryTreeUploadManager.js:24)
writeSummaryTreeCore (summaryTreeUploadManager.js:22)
writeSummaryTreeObject (summaryTreeUploadManager.js:48)
(anonymous) (summaryTreeUploadManager.js:24)
writeSummaryTreeCore (summaryTreeUploadManager.js:22)
writeSummaryTree (summaryTreeUploadManager.js:19)
await in writeSummaryTree (async)
(anonymous) (shreddedSummaryDocumentStorageService.js:112)
await in (anonymous) (async)
timedExecAsync (logger.js:331)
uploadSummaryWithContext (shreddedSummaryDocumentStorageService.js:107)
uploadSummaryWithContext (documentStorageServiceProxy.js:26)
uploadSummaryWithContext (documentStorageServiceProxy.js:26)
uploadSummaryWithContext (retriableDocumentStorageService.js:45)
uploadSummaryWithContext (containerStorageAdapter.js:51)

DLehenbauer commented 2 years ago

@milanro - You mentioned encryption above. Did you mean compression, or are you interested in encrypting data as well?

As an FYI, @noencke is beginning to work on incremental summaries (but not compression.)

@milanro - You asked how GC works. In Fluid, most DDSes can store references to blobs and other DDSes via IFluidHandles. In this way, a user constructs a graph of Fluid objects that is rooted at the container.

Lifetime in Fluid is determined by reachability from the container. If the user removes the last reference to a DDS or blob, it is no longer reachable and will eventually be reclaimed.

In order to discover what is no longer reachable, the runtime starts at the root DDS and calls '.getGCData()', which returns a list of contained IFluidHandles (or more accurately, the ids that handles point to.) The runtime then recursively calls '.getGCData()' on the target of each 'id', and so on, until it has built a list of all IFluidHandles that are transitively reachable from the container.

As you've discovered above, most DDSes do not implement getGCData() directly, and instead fall back on a default implementation that invokes summarizeCore() with a special IFluidSerializer that does nothing but record the IFluidHandles encountered.

@milanro - IDocumentStorageService is the contract between the runtime and the driver for storage. AFAIK, all calls to storage should funnel through that interface.

milanro commented 2 years ago

@milanro - You mentioned encryption above. Did you mean compression, or are you interested in encrypting data as well?

@DLehenbauer - At present, I am interested only in compression.

milanro commented 2 years ago

@dstanesc - see below, is this what was on your todo list? @DLehenbauer - just to be sure, is this M3 and will it be implemented by MSFT instead of us?

As an FYI, @noencke is beginning to work on incremental summaries (but not compression.)

dstanesc commented 2 years ago

As an FYI, @noencke is beginning to work on incremental summaries (but not compression.)

@DLehenbauer @noencke Because of its relevance to large data agenda we would like to stay in the loop w/ the incremental summaries work. Will @noencke work focus on the content-defined chunking, logical/explicit blob handle reuse in the DDS itself or both?

DLehenbauer commented 2 years ago

We had a two-pronged approach. You were going to explore adding compression and then automatic chunking/deduplication using a rolling hash to achieve incremental summaries in a generic way that works across DDSes.

@noencke is exploring the other option: exposing to DDSes the ability to explicitly reuse blobs during summarization so that a sophisticated DDS like the new SharedTree can managed its own chunking/deduplication.

We'll definitely keep you in the loop. Noah is just starting to experiment to see what the APIs exposed to the DDSes today will allow.

DLehenbauer commented 2 years ago

@dstanesc - I was intrigued by your remark that LZ4 had less impact on content-defined-chunking. I looked into the LZ4 algorithm a bit and found that the format alternates between two operations: appending a new literal string of bytes to the decompression buffer and copying an existing string of bytes from the decompression buffer.

Two reasons LZ4 works well with CDC:

There's no bit-level entropy coding, so data stays byte aligned.
Copies are from a 16-bit offset from the tail of the decompression buffer, so the effect of earlier/later mutations "damp out".

CDC compatibility is a really interesting advantage of LZ4 that you've discovered. It makes me curious if we could exploit it further (e.g., chunking concurrently with compression.)

@vladsud, @justus-camp - FYI: Another interesting advantage of LZ4 discovered by @dstanesc.

dstanesc commented 2 years ago

Thanks @DLehenbauer for looking into it and confirming the theoretical aspects. We can definitely build upon this special ability of LZ4 to compress relatively well, be fast and stay stable.

I am attaching below the early results of the stability benchmark looking into the block reuse when modifying variable size collections of materials in 3 ways:

appending 3 materials at the end,
inserting 3 materials in the middle
replacing 3 materials in the middle

Comparing in the charts json text, msgpack, combined msgpack & lz4 compressed and combined msgpack & pako encodings.

LZ4 displays remarkable benefits (compared to pako for instance, also benchmarked). The CDC configuration is just a convenient choice, block reuse rates can be improved by reducing the sliding window. Ideally a function of the target data size. Once finalized I will provide the repo link so that everyone can run the tests locally.

BTW: the performance benchmark is quasi finalized and shows extremely good results for fastcdc (eg crunching 12 MiB json data & compressed LZ4 to 2.3 MiB in 5 ms!) .

Firefox_Screenshot_2022-09-01T23-55-52 212Z

microsoft-github-policy-service[bot] commented 1 year ago

This PR has been automatically marked as stale because it has had no activity for 60 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

milanro commented 1 year ago

We should probably keep this opened.

microsoft-github-policy-service[bot] commented 1 year ago

This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!

microsoft / FluidFramework

Summary / Operation Compression Discussion #11366