httpwg / http-extensions

HTTP Extensions in progress
https://httpwg.org/http-extensions/
416 stars 139 forks source link

QUERY should describe use of Content-Location field #1745

Open MikeBishop opened 2 years ago

MikeBishop commented 2 years ago

@reschke mentioned in #1552 that there should be a way for the server to indicate a GET-able equivalent to the query if one exists; this has the benefit of being more easily cacheable by intermediaries which don't understand QUERY.

Based on SEMANTICS, I think the right mechanism for this would be Content-Location. A server can optionally indicate the GET-able location of the query to which it is responding.

mnot commented 2 years ago

Another approach would be to return 201 Created along with a Location header. To my eye, that more clearly communicates that the URL is pointing at a new resource for the query, rather than for the content of this particular result representation.

martinthomson commented 2 years ago

The idea that you might "create" a resource like this is a little strange. Presumably, the same QUERY, when repeated, will "create" the same URL.

Still, the semantic does fit fairly neatly and people are well used to the general pattern. And "Created" is just the name; the semantic is a pretty good fit.

mnot commented 2 years ago

I went to start a PR for this, but saw that the spec already includes an example that uses 303 (See Other) for this pattern.

That avoids the counterintuitive aspect of an idempotent method creating something. I wonder just a little bit about whether the relationship between the query presented and the ongoing state of that resource is clearly stated enough, but I suspect it's no worse than any other solution discussed.

So I think we can close this with no action, if giving an example is adequate?

reschke commented 2 years ago

Not sure. It implies an additional roundtrip, no?

mnot commented 2 years ago

Can't the 303 contain the payload of that other resource along with a C-L?

reschke commented 2 years ago

"Except for responses to a HEAD request, the representation of a 303 response ought to contain a short hypertext note with a hyperlink to the same URI reference provided in the Location header field."

mnot commented 2 years ago

just an 'ought to' :)

reschke commented 2 years ago

Why not simply "200" with C-L?

mnot commented 2 years ago

So does C-L point at the same query results at this point in time, or does it point at a resource that gives updated results for this query?

reschke commented 2 years ago

Ironically, I had a draft addressing this question (https://greenbytes.de/tech/webdav/draft-reschke-http-get-location-latest.html) and was told it wasn't needed because there's C-L.

asbjornu commented 2 years ago

I don't think the GET-ability of the URI in C-L is the most important aspect here, it's that it provides a key for which to cache the result. For repeated requests (refreshes), the client should of course GET the URI of C-L in order to benefit from the built-in cache mechanisms of GET.

The status code of the response from a QUERY request may be up to the server to decide and is perhaps not something this spec should say anything about?

MikeBishop commented 2 years ago

The definition of a "safe" method is that the client does not request or expect changes to server state, though the server might of course perform state-changing actions on its own initiative. In that light, 201 seems a very strange response. The client did not ask the server to create anything.

The use-case that I envision is using this for a refresh with fewer bytes on subsequent requests. Is there a real-world argument for needing to fetch a historical snapshot of a query that wouldn't be served by simply including a timestamp on a fresh query (and then potentially getting a URL to redo that query in the future)?

asbjornu commented 2 years ago

I agree a 201 response from a safe method request is weird, @MikeBishop. But it's no more or less weird for QUERY than for GET. If anything should be said about this, I think it belongs in section 4.2.1 of the HTTP specification which may cover this well enough already:

This definition of safe methods does not prevent an implementation from including behavior that is potentially harmful, that is not entirely read-only, or that causes side effects while invoking a safe method. What is important, however, is that the client did not request that additional behavior and cannot be held accountable for it.

So 201 is permitted, but is admittedly not something the client can request and therefore possibly not even expect. I think 303 is a better fit, personally.

erincandescent commented 2 years ago

I don't think either of the Location or Content-Location headers can be used to approriately convey "this is an alternative URL at which you can perform a GET request to repeat this query" - except perhaps a permanent redirect.

Consider for example the QUERY equivalent of Google's I'm Feeling Lucky button

IamfromSpace commented 2 years ago

I feel like one interesting aspect of this is that returning a Location or Content-Location header does have some differing meanings. A Location might indicate that there is a resource that was found, that can now be located, and might support all sorts of representations. And a Content-Location header would indicate a permanent place to view this particular representation of the resource discovered by the QUERY. Arguably, both could make sense.

I don't think that either could/should imply that the location is an "immutable snapshot." They're simply discovery of resources.

From that standpoint, I think 201 Created only makes sense if you are in fact directing the user to an immutable snapshot of the result. Other resources discovered should already exist, the only think we could create is resource that represents this query.

I don't personally think that creating a snapshot is an oxymoron for a safe method. The resource you interacted with is not invalidated, something else is just there now too.

However, I think it would be confusing to use Location to mean two different things (200: "Here's the resource you found," 201: "Here's a permanent location for the response to this query.").

Just wanted to throw a couple thoughts into the ring :)

gstrauss commented 1 year ago

Not sure. It implies an additional roundtrip, no?

@reschke to avoid the extra round trip while at the same time allowing a fully-cacheable GET resource, an HTTP/2 server could send PUSH_PROMISE with the result, and then send 303 See Other with that Location.

Importantly, 303 See Other with Location still works as desired when PUSH_PROMISE is not supported or is disabled, though with an additional round trip for the subsequent GET request to the server.

reschke commented 1 year ago

@gstrauss - are you aware of https://developer.chrome.com/blog/removing-push/ ?

gstrauss commented 1 year ago

@gstrauss - are you aware of https://developer.chrome.com/blog/removing-push/ ?

I was not. Thanks for the pointer.

FWIW, I prefer QUERY response 200 OK with optional Content-Location. Simpler proxies can cache subsequent GET requests to the location provided. (Your decade-old draft for GET-Location would be even more explicit.)

303 See Other with Location is also valid. While it could optionally provide a body, I think that clients would generally look for Location and make a new request to that Location with 3xx instead of displaying the response body provided with the 3xx response.

For QUERY, wouldn't caching proxies need to understand creating a cache key combining both QUERY headers and QUERY request body? For privacy/security, clients might want to include request headers to ask that proxies not cache such queries.

royfielding commented 2 months ago

Oh dear. Let's not mix things up.

The request method defines what is requested, not what the server does. 201 is always a valid response when a resource is created, even if the user agent didn't request it, but providing a Location without some corresponding defined semantics for what it identifies would not be very useful. QUERY could define such semantics for Location in any successful response, not just 201, as defined in

https://datatracker.ietf.org/doc/html/rfc9110#name-location

so that Location could be used in a 200 response to indicate a pre-existing resource, or in a 201 to indicate a new one. Such a relationship defined by Location does not necessarily have anything to do with the content in the response, unless that's what the method specifies.

Content-Location, on the other hand, is a relationship between the content in the response and a resource that provides the same content in response to GET. Note the emphasis: It is the same content, not the same service. It is not an equivalent URI to repeating the same QUERY parameters again in the future.

For example, if I were to QUERY a resource that provides the weather for Tustin today, like

QUERY /weather?city=Tustin HTTP/1.1
Host: example.com

and I received back

200 OK
Content-Type: text/plain
Content-Location: /weather/us/ca/tustin/20240524
Location: /weather/us/ca/tustin/

Sunny 66°F

then the Content-Location identifies a resource that should always respond with "Sunny 66°F" no matter what the weather might be in the future. IOW, it provides a representation of that past result, not a repeat of the same query.

In contrast, the Location might identify a resource identifier that could be used in the future to perform the same query using a GET without unsafe parameters, if that's how Location on a 2xx response is defined by Query.

Yes, there is some overlap here with 303. The difference is that 2xx supplies a representation of the query result, whereas 303 does not -- it instructs the client to GET a representation over there.

Regarding cache semantics: I strongly recommend against any attempt to use the result of a QUERY as if it were a cached copy of some other resource. If you want those semantics, respond with 303. The 303 forces the client to perform a GET through consistent mechanisms for the service, including CDNs, filters, pipes, security, context, etc.

Caching QUERY directly and reusing it for a later GET is guaranteed to result in security holes somewhere along the service chain, either because something doesn't implement QUERY, or there is layered processing that is bypassed by QUERY, or there are access controls that don't know about QUERY, or it's Tuesday, or it's a beach day, or ... I could literally spend the next two weeks explaining how I could trojan horse any CDN that assumes a given path had processed the result of a GET. Don't do that. 303 works fine and is already deployed.

MikeBishop commented 1 month ago

@royfielding, I don't necessarily have a problem with the mapping that Location enables repeating the query and Content-Location points to the historical information. Indeed, I can see the case that the historical query data might be the "most specific resource" in the sense that "at a specified instant" is more specific than an ever-changing now.

Based on 9110, I'm reading Location as being a resource that's related to the response with semantics defined by other elements of the request/response, so the Location can be basically whatever we define it to be for QUERY. Content-Location is a resource which currently has the representation carried in the message body.

The "Content-Location" header field references a URI that can be used as an identifier for a specific resource corresponding to the representation in this message's content. In other words, if one were to perform a GET request on this URI at the time of this message's generation, then a 200 (OK) response would contain the same representation that is enclosed as content in this message.

That's what pointed me in the direction of Content-Location repeating the query -- the response body contains the representation of the query's current results. 9110 makes no statement about the content of that resource in the future that I can find.

Regardless, I don't think either spelling is inherently wrong; we just need to agree on one.

mnot commented 1 month ago

Regarding cache semantics: I strongly recommend against any attempt to use the result of a QUERY as if it were a cached copy of some other resource. If you want those semantics, respond with 303. The 303 forces the client to perform a GET through consistent mechanisms for the service, including CDNs, filters, pipes, security, context, etc.

Agreed. If we want to enable caching based upon Content-Location, we should do it in a systematic manner, not just for QUERY.

reschke commented 2 weeks ago

Ok, I re-read the thread over here and in the PR.

@royfielding - is your main concern the use of Content-Location over Location? 'm not convinced that your description of Content-Location is actually backed by the HTTP spec:

In other words, if one were to perform a GET request on this URI at the time of this message's generation, then a 200 (OK) response would contain the same representation that is enclosed as content in this message.

https://greenbytes.de/tech/webdav/rfc9110.html#field.content-location

(Emphasis mine, or actually @MikeBishop 's)

But if switching to Location allows us to move forward, so be it.

What we should agree on is:

royfielding commented 2 weeks ago

Ok, I re-read the thread over here and in the PR.

@royfielding - is your main concern the use of Content-Location over Location?

Yes, though trying to pre-cache this response as if it were a GET on the Location is a separate concern

I'm not convinced that your description of Content-Location is actually backed by the HTTP spec:

In other words, if one were to perform a GET request on this URI at the time of this message's generation, then a 200 (OK) response would contain the same representation that is enclosed as content in this message.

https://greenbytes.de/tech/webdav/rfc9110.html#field.content-location

That is literally what it says. The Content-Location is a location for this content, not the location of an equivalent query. An equivalent to query would have to be defined elsewhere for any content that varies over time/context, and the elsewhere pointer is traditionally provided by Location when defined as such by the method and/or status code.

But if switching to Location allows us to move forward, so be it.

Yep.

What we should agree on is:

* Is this specific to QUERY? That is, if I wanted the same feature for PROPFIND/SEARCH/REPORT (all being safe methods with payload), would I need to update their specs?

The method should explain what the Location means on a 2xx response. So, yes, an update would be appropriate, or you could just implement it that way and standardize it later.

Another way to think of it is "Would this make sense on all methods, or just some?" I am pretty sure it doesn't make sense for DELETE, PUT, PATCH, etc (unlike Content-Location, which means the same for all methods and status codes). Hence, the query Location isn't universal, but it can be consistently defined across all similar methods, such as the methods that retrieve information.

* What exactly this means for cacheability (before an actual GET request is made), and (again) whether that's specific to QUERY or universal (in the latter case, would we want to consider this an update to RFC 9110???)

Implement first, standardize later. As I said, the problem with caching is that different requests (method, request-URI, etc.) take different paths, often defined external to the origin server, so it is difficult to safely cache the result of one request as a result of some different request without validating that result through its normal path (both upstream and downstream of the handling implementation). The efficiency isn't worth the failed assumptions, at least not without some form of crypto hash or implementation-specific integrity check.

MikeBishop commented 1 week ago

I'm not convinced that your description of Content-Location is actually backed by the HTTP spec:

In other words, if one were to perform a GET request on this URI at the time of this message's generation, then a 200 (OK) response would contain the same representation that is enclosed as content in this message.

https://greenbytes.de/tech/webdav/rfc9110.html#field.content-location

That is literally what it says. The Content-Location is a location for this content, not the location of an equivalent query. An equivalent to query would have to be defined elsewhere for any content that varies over time/context, and the elsewhere pointer is traditionally provided by Location when defined as such by the method and/or status code.

What it says is that at this time, the content from this message is what you would receive in a GET at that location. That's a true statement for both static and variable content, since the statement specifically indicates it's at the time of the response. The spec makes no claim that the indicated URL will continue to contain the same content in perpetuity.

Again, I don't have a religious objection to using Location for this, but I think you're projecting semantics onto C-L that aren't stated in what we've written down.

reschke commented 3 days ago

Should we flip a coin? I guess not.

If we're not entirely happy with what the base spec says, maybe we should attempt a clarification, file as erratum, with the understanding it would be a "held for document update"?

Or maybe try to clarify in this spec? Would that mean that we need an "updates RFC 9110"?

royfielding commented 3 days ago

We really don't need to flip any coins. Please do not get hung up on one sentence read out of context. The 9110 spec is correct as written. Read it in the context of the section being described.

Content-Location was defined for email. It has the same meaning regardless of the status code. If you were to get a 500 response from Twitter with a fancy fail whale image, the Content-Location points to the fail whale resource. It might have nothing to do with the original request resource aside from being a response from the same gateway server.

Content-Location might apply to any message, both requests and responses.

It does not need to be clarified further, unless someone wants to write a book about HTTP.