Closed branlwyd closed 5 months ago
Thinking about what this would look in terms of textual change to the spec: I think we would just change "The Leader SHOULD select a batch which has not yet began collection." from a SHOULD
to a MUST
.
We could optionally choose to provide an implementation advice that the Collector note the collection job ID to durable storage before sending a collection request to avoid data loss.
SGTM, please send a PR :)
In the fixed-size query type,
current-batch
collection requests allow the Leader to choose an outstanding batch to associate with the request.The current semantics are that the Leader can associate the same batch to multiple
current-batch
collection requests (DAP-07 4.1.2):The reason these semantics were chosen was to avoid data loss in the case that a Collector issued a collection request, then crashed before recording the collection job ID. Janus, for example, can associate the same batch to an arbitrary number of
current-batch
collection requests, until at least one of those collection requests is polled for the first time.===
However, since the above semantics were chosen, DAP changed such that the Collector now determines the collection job ID itself (as part of the changes for the resource-oriented API). This means that we could similarly avoid data loss if we expect the Collector to durably store the collection job ID before making the collection request for that ID, and lean on either idempotency or appropriate error codes to allow recovery in the face of process failure. This would allow aggregator implementations to associate each batch to exactly one
current-batch
request.The upside of making this change is that Collectors would no longer need to deduplicate
current-batch
requests that happen to be mapped to the same batch. Along with the simplified aggregator semantics, I think this is pretty likely to be an overall complexity win.