ietf-wg-ppm / draft-ietf-ppm-dap

This document describes the Distributed Aggregation Protocol (DAP) being developed by the PPM working group at IETF.
Other
45 stars 22 forks source link

Why do duplicate report ids have error unrecognizedMessage #453

Closed ekr closed 1 year ago

ekr commented 1 year ago

AggregationJobInitReq.report_shares are all distinct. If two ReportShare values have the same report ID, then the helper MUST abort with error unrecognizedMessage.

Shouldn't this be a more specific error or just result in suppressing one of the shares?

cjpatton commented 1 year ago

I have no objection. IIRC we decided to abort because it is a stronger signal that something is going wrong, but I think just rejecting the report instead would work just fine.

ekr commented 1 year ago

I could go either way, but I think if it's going to be rejecting the whole message, it should be its own error.

branlwyd commented 1 year ago

I think one reason we decided to fail the entire message was that, if a Leader sends two different reports with the same ID, this indicates that the Leader itself is confused about the content of that report (as reports are primarily identified by report ID).

However, the aggregation process uses the order of reports to assist with matching -- the semantics are that the same report aggregations will be referenced in the same order with each request, except that the Leader does not transmit any that have failed; matching is done by report ID -- so it's plausible that things might work out here. But failing just one of the reports will leave the other in place, which might plausibly create confusion for the Aggregators when they next attempt to match an incoming message's report IDs to the report IDs associated with their own state.

I think if we don't fail the entire message, we should fail both reports to avoid potential confusion matching request prepare steps to stored state in the aggregation process. (or, we could change the aggregation process to avoid this somehow -- we've chatted about not dropping errors from the list of preparation shares that are passed back and forth)

(I don't have a strong opinion on whether we should have a new error or not -- unrecognizedMessage definitely isn't the error I would expect for this case, though it is sorta used as a grab-bag for low-level message parsing failures.)

ekr commented 1 year ago

I don't object to a generic message but then it should be like invalidMessage

cjpatton commented 1 year ago

I agree with EKR that a new error type for this case is warranted. @tgeoghegan has pushed back on adding new error types in the past, so we should probably get his take as well.

tgeoghegan commented 1 year ago

I think we should only add new error types/codes if we expect the recipient of that error message to act differently when it sees it. I think that if we had both invalidMessage and duplicateReport, then client implementations would do exactly the same thing in either case (give up on uploading that particular report). Adding a new error code just means client implementations have to write more code and write more tests in exchange for no additional functionality.

However I'm willing to defer to ekr's superior experience in protocol design; if you think that having more specific error types is sufficiently valuable for, say, debugging, then we can do that (though I'd also point out that RFC 7807 problem documents do allow implementations to include arbitrary extra information).

I also agree that invalidMessage is a better name than unrecognizedMessage, if we decide to not add more error types.

divergentdave commented 1 year ago

This is in the context of the aggregation flow, so the leader would have sent an aggregation job with a repeated report ID in it to the helper, and the question is what the helper's error type should be. I think the analysis is the same, as the leader should stop processing the job and not retry, since one of the aggregators has a programming error, or they are trying to speak incompatible protocol versions.

ekr commented 1 year ago

I actually don't feel strongly about how many errors there are. I'm mostly just advocating for a different name.