ietf-wg-ppm / draft-ietf-ppm-dap

This document describes the Distributed Aggregation Protocol (DAP) being developed by the PPM working group at IETF.
Other
46 stars 22 forks source link

Possible states of PrepareSteps sent in an aggregation continue request are ambiguous. #438

Closed divergentdave closed 12 months ago

divergentdave commented 1 year ago

The spec says in the helper continuation section that helpers should be prepared for leaders to send PrepareSteps in the failed state.

If the status is failed, then mark the report as failed and reply with a failed PrepareStep to the Leader.

In the leader continuation section, it says of the same continue request message that,

The prepare_steps field MUST be a sequence of PrepareSteps in the continued state containing the corresponding inbound prepare message.

We should be clear about whether non-continue PrepareStepStates are allowed in these requests. It might be useful to allow failed in AggregationJobContinueReq steps, because then it could carry a ReportShareError, so that the helper could get visibility of error codes, as the leader does. (The helper can already notice that a report share was filtered out from one request to the next, and infer that there was some error)

cjpatton commented 1 year ago

Yup, we definitely need to clarify this, and I agree that an explicit signal of rejection is most useful. For what it's worth Daphne Helper will abort if it encounters a failure in the AggregateContReq: https://github.com/cloudflare/daphne/blob/main/daphne/src/vdaf/mod.rs#L755

branlwyd commented 1 year ago

I agree this merits clarification, and also vote for an explicit signal of rejection (i.e. the Leader sends Failed PrepareSteps to the Helper on error).

tgeoghegan commented 1 year ago

I believe this issue is obsolete, because the changes in #393 mean that it's no longer possible for AggregationJobContinueReq to contain a failure or share rejection message. However, what remains is the idea that the leader should explicitly signal preparation failure to the helper. I don't think we have a strong enough case for this yet, especially since deployments can use some means out of band of DAP to share error information between aggregators. So we should keep this open to eventually discuss an in-band mechanism for leader-to-helper error reporting, but I don't think there's anything to do for draft-ietf-ppm-dap-05 anymore (except to update a TODO which referenced the wrong issue number).

cjpatton commented 12 months ago

Closing with no action. For the moment we don't have a compelling reason for a fancier error handling mechanism. If we want to discuss this further, let's open a fresh issue with a more refined problem statement.