ietf-wg-ppm / draft-ietf-ppm-dap

This document describes the Distributed Aggregation Protocol (DAP) being developed by the PPM working group at IETF.
Other
46 stars 22 forks source link

Resource oriented HTTP API (was: Conveying task and job IDs) #278

Closed chris-wood closed 1 year ago

chris-wood commented 2 years ago

Issue #261 touched on how task and job parameters are conveyed between aggregators across request and response messages. This issue tracks the more general question of how we convey these parameters. They could be conveyed in application messages as they are now, in URLs as query parameters, or even headers.

@mnot, if you have cycles, I'd be very interested to hear what you think.

Originally posted by @BranLwyd in https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/261#issuecomment-1132242994

mnot commented 2 years ago

What's the deployment model? In other words, if I wanted to deploy this protocol, would I:

a) spin up a new, dedicated server and treat it like a black box b) create new resources on an potentially existing server that are specific to this protocol c) apply the protocol to potentially existing resources on a server

?

chris-wood commented 2 years ago

I think (a) is going to be the most common deployment model here.

cjpatton commented 2 years ago

Based on my experience implementing the current draft, I think it would be great to move the task ID to either a header or the query string of the URL. The latter might be preferable since it's more consistent with /hpke_config. I'm agnostic about the aggregation-job ID.

The problem solved by moving the task ID to either the URL or a header is the following: Before parsing a message it's useful to first make sure that the message is authentic: https://github.com/cloudflare/daphne/blob/main/daphne/src/roles.rs#L421

However to authenticate a DAP message it's necessary to at least parse the task ID in order to look up the bearer token: https://github.com/cloudflare/daphne/blob/main/daphne/src/auth.rs#L84-L87

mnot commented 2 years ago

Will it ever be necessary to send these requests in an unmodified browser (either by directly entering the Urls into the location bar, or using forms or javascript)?

If not, it's largely a matter of stylistic preference.

URL parameters are visible in the location bar (and in logs); their semantics are determined by the resource you're interacting with. So, if you use them you should be approaching the protocol as an exercise in resource modelling -- i.e., "this resource takes those parameters, and does that".

The body is similar to URL parameters in terms of scope of semantics, but needs to be identified with a media type. Also it's not visible, and not available on some request methods, and its processing is tied to the request method to some degree (most loosely with POST).

Headers have a (purportedly) universal scope -- but it's OK if they have no applicability to most resources. They go on all messages, and keep the URL clean. If you use a header, take a look at Structured Fields.

Does that help?

chris-wood commented 2 years ago

Will it ever be necessary to send these requests in an unmodified browser (either by directly entering the Urls into the location bar, or using forms or javascript)?

If not, it's largely a matter of stylistic preference.

OK, great! These will almost never be sent in an unmodified browser, so I'm glad this boils down to preference.

URL parameters are visible in the location bar (and in logs); their semantics are determined by the resource you're interacting with. So, if you use them you should be approaching the protocol as an exercise in resource modelling -- i.e., "this resource takes those parameters, and does that".

The body is similar to URL parameters in terms of scope of semantics, but needs to be identified with a media type. Also it's not visible, and not available on some request methods, and its processing is tied to the request method to some degree (most loosely with POST).

Headers have a (purportedly) universal scope -- but it's OK if they have no applicability to most resources. They go on all messages, and keep the URL clean. If you use a header, take a look at Structured Fields.

Does that help?

Indeed it does, and I have one final question. Some of these APIs will be called with the expectation that the client is authenticated. Right now, we're using bearer tokens for experimentation purposes, since they're simple to configure and easy to check. In the future, we might require the underlying channel to be mutually authenticated via mTLS or whatever's relevant. However, there's another variant wherein the request is authenticated using message signatures. In this case, does the choice of URL parameter vs header vs request body matter much? My understanding of the request signature design is that all contents of the request, including the URL, any headers, and body are all authenticated, so my intuition suggests that this again boils down to stylistic preference. Is that right?

@tgeoghegan, assuming the above is accurate, how do you feel about using URLs for things like task IDs, job IDs, etc, and keeping the rest of the version-specific DAP stuff in the body? If and when DAP is versioned, I would expect the content types of the messages to change, but I can't imagine the concepts of task ID or job ID changing much. In other words, the latter seem like invariants (for lack of a better term), so sticking them in URL parameters allows them to be handled separately from things that might change by version or VDAF algorithm. (This relates to how we choose to indicate the VDAF algorithm in use, which is currently done by just tying it to the task ID.)

tgeoghegan commented 2 years ago

AFAIK the HTTP message signature stuff has evolved from the AWS request signature scheme, which does cover headers, query params and bodies, so you're right that we are flexible in that respect. The one gotcha is that not all HTTP methods allow bodies (crucially, GET requests can't have one), so that may nudge us towards using headers or query params.

All that being said -- I think we should plan to refactor the HTTP API endpoints so that they are oriented around the protocol's entities/nouns instead of actions/verbs. I started sketching that idea here. That change will be pretty disruptive, so I think I want to target it for the draft we'd submit to IETF 115 (114 being right around the corner).

chris-wood commented 2 years ago

Agreed 👍 Let's use this issue to hash that initial out idea a bit further.

tgeoghegan commented 2 years ago

At IETF 114's PPM session, we discussed rewriting the HTTP API to be more resource oriented and align with the relevant BCPs (slides 8-9). As I noted above, I'd like to get that done for the next draft, so I've retitled this issue to focus it on the work I plan to do.

tgeoghegan commented 2 years ago

There are numerous places in DAP where we mandate specific HTTP status codes in responses, but RFC 9205 recommends against this. We should revisit prescriptions about responses and make sure that any protocol-critical error information is conveyed in a problem document type, and otherwise allow implementations to use whatever 4xx or 5xx HTTP status they want.

RFC 9205 also suggests a format for example HTTP messages, which we should use where possible.

tgeoghegan commented 2 years ago

I think this is still doable for IETF 115, but it looks like we'll release draft-02 before this is ready, so punting to draft-03.

simon-friedberger commented 2 years ago

RFC 7807 has specific suggestions on how to communicate error details which might be interesting.

mnot commented 2 years ago

Note that it's currently being revised - see https://github.com/ietf-wg-httpapi/rfc7807bis

simon-friedberger commented 2 years ago

Also related: We have "outdatedConfig" but we seem to be missing "unrecognizedConfig" like we do for task IDs. Might be worth adding when fixing this.

wangshan commented 2 years ago

A potentially related question, @chris-wood is POST the right method in a resource oriented API, compared to PUT?

tgeoghegan commented 2 years ago

A potentially related question, @chris-wood is POST the right method in a resource oriented API, compared to PUT?

POST and PUT are both perfectly valid methods to use. The key distinction between them is that PUT is idempotent and POST is not. From Mozilla's MDN web docs: "calling [PUT] once or several times successively has the same effect (that is no side effect), whereas successive identical POST requests may have additional effects, akin to placing an order several times."

This is significant when you consider recovery from error cases: suppose a client sends a message to a server, but the response is lost due a network error. The client now does not know whether their request was handled and thus whether they can move to the next step in some protocol. If the request was idempotent (like a PUT), then it's OK for the client to re-send the request until they get a successful response, since they know that the server receiving the same request multiple times won't have caused duplicate work or other side effects. If the request was not idempotent, then the client needs some other means of figuring out what state the server is in and recovering (perhaps a GET on some other resource).

cjpatton commented 2 years ago

Removing the draft tag, as we're punting beyond 03.