Closed hedasilv closed 1 year ago
This PR has been automatically marked as stale because it has had no activity for 60 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework!
Data loss scenario: operations deleted from storage and not persisted in summaries are lost forever
In #4732, we introduced a TTL to ops stored the MongoDB. However, we did not update the summarization logic accordingly. So when summary operations do not happen, or do not complete successfully, for a period longer than the TTL, we are possibly purging operations from MongoDB that were never incorporated into a summary. For example, we observed the following behavior:
.protocol/attributes
section indicate the sequence number of the summary is still stuck at the seq number of the last successful client summary (more than 20 hours ago).To Reproduce
Steps to reproduce the behavior: It is somewhat complicated to reproduce the situation locally, but we would need:
Still, this type of unexpected behavior has already been observed by multiple partners using our service.
Expected behavior
summaryWriter
to write the client summary, we receive anISummaryWriteResponse
object, but we don't log the "message" which could give clues about why the summary failed.Logs