ietf-wg-ppm / draft-ietf-ppm-dap

This document describes the Distributed Aggregation Protocol (DAP) being developed by the PPM working group at IETF.
Other
46 stars 22 forks source link

Make streaming aggregation normative #581

Open cjpatton opened 1 week ago

cjpatton commented 1 week ago

An issue that's been raised several times (most recently in #571) is that the term "aggregation job" is a misnomer since it doesn't actually say anything about aggregation. All it says is that, once an aggregation job is complete, each aggregator commits to the sequence of output shares they recovered. It's not until collection time that we call Vdaf.aggregate() on the set of output shares to get our aggregate share for the batch.

Given this fact, it's been suggested we rename "aggregation job" to something like "preparation job", since the main thing that happens during this protocol is VDAF preparation.

I don't recall precisely how this term entered our vocabulary, but part of the reason it stuck is that, in practice, we expect implementations to do some, if not most, of the aggregation during this phase. From Section 4.7:

After an aggregation job is completed, each Aggregator stores the output shares until the aggregate share is collected as described in Section 4.8. Note that it is usually not necessary to store output shares individually: depending on the batch mode and VDAF, the output shares can be merged into existing aggregate shares that are updated as aggregation jobs complete. This streaming aggregation is compatible with Prio3 and all batch modes specified in this document.

In other words, we expect "streaming aggregation" to be the norm for DAP implementations and "delayed aggregation" during the collection (credit @bemasc for this terminology) to be the rare exception. In theory, you could have a VDAF for which aggregation is sensitive to the order of output shares; or a batch mode may require us to store output shares individually until collection. Neither is true today, and to me both seem pretty unlikely.

Rather rename the aggregation sub-protocol, we could make this streaming aggregation feature normative. This would require two things:

  1. The concept of a "batch bucket" (#385). The definition of "batch bucket" depends on the batch mode: for time-interval, it's a window of time (determined by the time_precision task parameter); for leader-selected, it's simply a batch ID. Each bucket has an aggregate share. At the end of an aggregation job, we would add each output share into the aggregate share of the bucket it corresponds to.

  2. We need to be able to express syntactically how to update an aggregate share with an output share. This is precluded by the current VDAF syntax, which only describes aggregating output shares into an aggregate share. We would therefore need to modify the syntax.

We would then document delayed aggregation as a fallback in case the VDAF or batch mode requires it.

cjpatton commented 1 week ago

I would be in favor of this change, but if we don't take this change, then I agree with @branlwyd and @bemasc that we should rename "aggregation job".

branlwyd commented 6 days ago

I'm in favor of this change.

In theory, you could have a VDAF for which aggregation is sensitive to the order of output shares; or a batch mode may require us to store output shares individually until collection. Neither is true today, and to me both seem pretty unlikely.

I agree that both of these are pretty unlikely. In particular, the storage cost of retaining all output shares individually would be rather high.

tgeoghegan commented 5 days ago

I also support this change, as it matches what the deployed DAP implementations actually do. However, besides introducing the notion of "batch buckets" to DAP, I think we need some more interfaces on VDAF. I've discussed that over in https://github.com/cfrg/draft-irtf-cfrg-vdaf/issues/432.

cjpatton commented 4 days ago

Feedback from 2024/9/25 call: @branlwyd: "streaming" may not be the best word because we have to wait to aggregate until the aggregation parameter.