Closed andre4i closed 2 years ago
@vladsud A good question was raised today during our sync-up about this snapshot size behavior difference between summarizers and non-summarizers and I think you'd be able to shed some light on this.
Assuming we are able to send a large snapshot and this bug is fixed. Only the client who summarizes is able to download it, right? Is this a resilience mechanism to hope for the garbage collector to purge the snapshot which would afterwards (hopefully) allow all other clients to download it? In other words, why are we allowing one client to download the snapshot when nobody else can?
Teams client wanted it to better control bandwidth consumption. Their take - if files are large, they want to render link in chat and let users decide if they want to follow link or not (and thus pay the cost or not). And yes, other clients may open it just fine.
Summarizer has to be excluded, such that if file grew 1 byte over the limit in this window (but was opened in Teams client before limit was increased), we are not leaving file unsummarized.
@chensixx , any chance you can take a look next week? Sorry to dump on you the last week of February, but that issue is rather important, and pretty well understood. It was raised by Teams team through ICM, and somehow priority of this item (or rather the fact that it's well understood) escaped me.
Please take a look. If you have any questions, I, Andrei or Jatin can help with it.
The core of the issue is that we lose property tracking that it's a summarizer (as it's passed from loader headers to resolved URI to driver), and as result driver does not reset mds property that blocks large summaries.
@chensixx, @andre4i, @jatgarg , can we prioritize this? This is rather bit reliability issue for Teams, I believe.
question: when @andre4i said we should let relative loader propagate client type, doesn't request already have the clientType in the headers? am I reading this correctly?
fix is in, still need to write unit test
currently manually testing loader is passing the correct type to odsp driver, and for summarizer client, the mds limit and timeout are undefined.
manual test completed, hostpolicy is set correctly
https://portal.microsofticm.com/imp/v3/incidents/details/286391295/home
There is a behavior captured in the above incident which seems to indicate that what is specified here does not hold and the size limit is actually taken into consideration further down in the code when
this.hostPolicy.summarizerClient
is enabled.We need to confirm that the limit does not apply to the summarizer (preferably adding some tests for the specific scenario). And if it does not apply, we need to follow how the
summarizerClient
flag is managed.Update
Snapshot limits should be disabled when requests are coming from a summarizer. See: https://github.com/microsoft/FluidFramework/blob/51d2d7b5541e2ad344d00509ff127f58fd792ea9/packages/drivers/odsp-driver/src/odspDocumentStorageManager.ts#L585
The property is set by the
OdspDocumentService
which it turn reads it from theodspResolvedUrl.summarizer
property: https://github.com/microsoft/FluidFramework/blob/51d2d7b5541e2ad344d00509ff127f58fd792ea9/packages/drivers/odsp-driver/src/odspDocumentService.ts#L149However, the resolved url is not set properly.
The summarizer does try to set it https://github.com/microsoft/FluidFramework/blob/51d2d7b5541e2ad344d00509ff127f58fd792ea9/packages/runtime/container-runtime/src/summarizer.ts#L126 but then the relative loader will just reuse the container's url: https://github.com/microsoft/FluidFramework/blob/51d2d7b5541e2ad344d00509ff127f58fd792ea9/packages/loader/container-loader/src/loader.ts#L77
This needs to be consistent and fixed.
As an implementation hint, it would probably make more sense to rely on the client type (summarizer vs interactive) and have the relative loader propagate the client type (for all requests). Then, the
OdspDocumentService
can adjust policy accordingly to the client type.