httpwg / http-extensions

HTTP Extensions in progress
https://httpwg.org/http-extensions/
428 stars 141 forks source link

Digest: problems reproducing some §10 examples #1349

Closed bc-pi closed 3 years ago

bc-pi commented 3 years ago

I admit to struggling to fully grasp the meaning of HTTP representation-data and/or selected representation as it relates to which actual bytes to use as input to the digest function (I was trying to write down how to use the Digest value as part of a simple digital signature scheme for HTTP messages FWIW).

In an effort to better understand, I went to work through the examples in Section 10 but ran into some things that didn't seem right.

Specifically for the response in sec 10.8, with some luck and experimentation I was able to reproduce the digest value of 0o/WKwSfnmIoSlop2LV/ISaBDth05IeW27zzNMUh5l8= from an input of {"status": "created", "id": "123", "ts": 1569327729, "instance": "/books/123"} as the body or representation-data. However, the JSON in the body of that example is semantically equivalent to that but formatted differently and consequently has a different digest value:

HTTP/1.1 201 Created
Content-Type: application/json
Digest: id-sha-256=0o/WKwSfnmIoSlop2LV/ISaBDth05IeW27zzNMUh5l8=
Location: /books/123

{
  "status": "created",
  "id": "123",
  "ts": 1569327729,
  "instance": "/books/123"
}

Similarly for the response in sec 10.10 the digest value in the example was reproducible from {"title": "Not Found", "detail": "Cannot PATCH a non-existent resource", "status": 404} but the formatting of the JSON in the example differs:

HTTP/1.1 404 Not Found
Content-Type: application/problem+json
Digest: sha-256=UJSojgEzqUe4UoHzmNl5d2xkmrW3BOdmvsvWu1uFeu0=

{
  "title": "Not Found",
  "detail": "Cannot PATCH a non-existent resource",
  "status": 404
}

I don't think I'm dong anything wrong here. But please correct me, if so.

I realize there are line length limitations to the textual RFC style output that need to be considered but I do think having examples that readers can easily and reliably follow and reproduce is worthwhile.

LPardue commented 3 years ago

Thanks for reporting this @bc-pi ! Its quite possible that these got broken along the way due to rewrap.

Out of curiosity, did you have any luck using https://httpwg.org/http-extensions/draft-ietf-httpbis-digest-headers.html#name-code-samples while working through things.

I agree we want to make this doc as user-friendly as possible.

bc-pi commented 3 years ago

My python skills are effectively nil so I didn't really even look at that section to be honest. I wrote some very simple Java code in my attempt at working through things.

Looking at it now (even with nil skills) raises some questions: The comments suggest that the sha256 digest value is the same for Identity and Brotli encoding (also a comma , where there should be a pipe | after Brotli?). That can't be right, can it? I see json.dumps in there, which according to https://docs.python.org/3/library/json.html#json.dumps does some kind of JSON serialization. That also doesn't seem right as no JSON serialization or normalization or processing should be applied, I don't think? Maybe that explains the discrepancy I ran into? i.e. the serialization is reformatting the JSON.

LPardue commented 3 years ago

Touché 😀

Roberto's the expert on that bit. We should double-check it and correct if there is an error.

json.dumps is used to serialise python objects to json. With the formatting of the code there I can see how the python object looks like json, and it is confusing as it might be read as an attempt to canonicalize. Unfortunate given the problems the spec seems to have with digests of json with newlines or not.

Sorry you spent some effort on this. I think we should take it as a signal to provide more guidance on how to read the examples, especially given that the code sample appendix is currently marked as to be removed by the rfc-editor.

ioggstream commented 3 years ago

@bc-pi thanks for noticing: please ping us whenever you find troubles with the implementation. I will fix the examples ensuring digest values reflect CRLF.

To be clear:

Thanks for your feedback! It was a nice Christmas Gift for us... I'm sorry you had to figure out alone that the examples had this \n issue...

bc-pi commented 3 years ago

Thanks @ioggstream and sorry about the timing with the holidays. I'd been meaning to raise it for a while but just hadn't gotten to it. Work was slow on the eve of Christmas Eve so I found some time to get through some of my todo list.

Yes, I get that digest is media-type agnostic and that was my understanding and expectation when reading the draft. The content in Code Samples started to make me second guess myself a bit though.

I still bear the scars of the many interoperability and security problems that came with XML canonicalization and signatures. So have zero interest in any media-type canonicalization.

I did look at the Usage with signatures section but still struggle to fully understand or convey the proper usage - particularly in cases where the full representation data isn't conveyed in the message body.

ioggstream commented 3 years ago

When using digest in conjunction with signatures, if you are not acquainted with partial representations, you can just convey complete representation. I stumbled upon it when reviewing some PSD2 docs where there was some confusion on the http semantic.

Partial representations use cases are useful when you need to resume a download and want to be sure that the final checksum of the file matches: in this case Digest conveys the checksum of the whole file together with the byte range requested. This allows you to know whether the whole operation was successful.

If this is not clearly stated in the I-D please let me know.

bc-pi commented 3 years ago

That makes sense, I think. For using digest in conjunction with signatures, you'd want to always convey a complete representation in the message with the digest value being calculated from that complete representation. In a document that aims to define a general signature mechanism, however, what should be said or done about situations that don't convey a complete representation in the message and digest? Integrity of the body isn't possible or signatures aren't applicable? Some other means of covering the body is needed? This is where I start to get confused.

I did look at draft-ietf-httpbis-message-signatures FWIW but it's using the older RFC and seems to make some assumptions about what the digest will be of. https://www.ietf.org/archive/id/draft-ietf-httpbis-message-signatures-01.html#name-define-more-content-identif has "Request body (currently supported via Digest header [RFC3230] )" and I don't see mention of response.

Really, for signatures, I think what is needed is a digest of the body (maybe without content coding) that's always just a digest of the body. And draft-ietf-httpbis-digest-headers isn't always that. Which is maybe the root of my confusion.

LPardue commented 3 years ago

Why does signatures need a complete representation?

I always figured there was value in being able to protect message metadata independent of the representation. For instance, I could scatter-gather a file download over different mirrors or hosts and then simply HEAD a more trusted authority for a signed response with Digest to give me a more-trusted hash for validation or the reconstituted object.

ioggstream commented 3 years ago

@LPardue you are right

Why does signatures need a complete representation? They do not

always figured there was value in being able to protect message metadata independent of the representation Yes, this is a major use case.

I wanted to say that if an implementer has doubt about partial representations or no-content, he can just use complete representations. I'll clarify my comment.

bc-pi commented 3 years ago

So I guess I fall in the camp of having some doubt or lack of understanding about partial representations (I do understand the resume download case though wonder if the checksum of the whole file wouldn't be better conveyed with a HEAD request but I digress). While signatures don't necessarily need a complete representation, I was trying to think about it in the context of the digest I-D and, best I understand it, a complete representation conveyed in the message body is the only time the that the digest is of the content of the (maybe encoded) body. It seems to me that many/most folks' general expectation of a signature is to provide integrity and authenticity of the message that can be validated while consuming the message (or sometime later for some notion of nonrepudiation) and that that validation can be self contained with respect to the message (other than perhaps locating and checking key material for verification). With the digest header, a complete representation seems the only way that's possible. At a minimum the verification code/process needs to be different depending on whether the digest can be calculated from the content of the message alone.

You say that one can just use complete representations, however, it's not immediately clear to me how to do that. Perhaps a complete understanding of HTTP semantics would help but I don't think I'm alone in not having that. And my attempts at the 200+ page draft-ietf-httpbis-semantics document (the reference link in the digest I-D 404s BTW and sec 5.2.2 has moved in -13) haven't been fruitful.

I'm not even sure what I'm driving towards with this, sorry, but I do think maybe some further discussion or clarification of some of this stuff in the I-D would be helpful.

I've long been skeptic of HTTP signatures in general, which makes my involvement in this conversation rather ironic. But I've been pulled somewhat into the space and am trying to navigate it as best I can.

LPardue commented 3 years ago

The issue is closed but its fine to continue the discussion IMO because it might surface something that can be clarified in the specification. I would ask that you make a new issue or mailing list thread as a good branching off point from the OP of this issue.

To answer in brief. Theres different layers of authenticity and integrity at play. My view is that signatures can provide integrity and authenticity of http metadata only. Someone that wants to sign a whole message must work within the constraints of available headers.

In simple terms, if an application of signatures want to rely on Digest for message payload integrity as you describe, the payload must be complete. That means for example a simple GET, not a HEAD or a range request.

I can understand if there is a use case that wants to sign an HTTP message that contains a partial response, such that the message payload (I.e.. the byte range indicated in the header) can be integrity checked. But digest does not support that. Changing digest to work that way is beyond the scope of this work. However, there is nothing stopping someone using a different header for that purpose - and it could even complement digest.

ioggstream commented 3 years ago

doubt or lack of understanding about partial representations

Please, ask.

a complete representation conveyed in the message body is the only time the that the digest is of the content of the (maybe encoded) body.

Ok, but it's no message body but payload body or payload data in HTTP terminology: this is because the payload can be conveyed by multiple chunked http messages.

folks' general expectation of a signature is to provide integrity and authenticity of the message that can be validated while consuming the message and that that validation can be self contained with respect to the message

This use case is covered by Digest; usage with partial representations you can do more, eg. validating a sequence of requests until download completion

a complete representation seems the only way that's possible.

only if you want a self-contained validation.

At a minimum the verification code/process needs to be different depending on whether the digest can be calculated from the content of the message alone.

Yes, but that depends on how you use the HTTP protocol.

it's not immediately clear to me how [use complete representations]

Generally 1- just PUT, POST, and GET without Content-Range; 2- do not return No Content.

200+ page draft-ietf-httpbis-semantics document

If you are interested in, I made some comments here but it seems they are still lingering there https://github.com/aaronpk/oauth-v2-1/pull/22 Generally, when building protocols based on HTTP, semantics should be taken into account because that's how intermediaries, servers and user agents use the protocol.

I do think maybe some further discussion or clarification of some of this stuff in the I-D would be helpful.

Sure, I'm available for discussion, and even a brief call.

I've long been skeptic of HTTP signatures in general

Could you share pls?

bc-pi commented 3 years ago

Thanks for the responses and continued engagement here. And sorry that I've been slow in responding myself. I will post something to the httpbis list but I think ultimately what I'd be looking for in the I-D is a more formulaic description of how to consistently determine what the input to the digest is for both requests and responses. Something like the "Generally... just PUT, POST, and GET without Content-Range... " you wrote but as exhaustive as possible (from the examples Content-Location is meaningful but not totally clear to me how in some cases - like what if a POST response describes the request status like in 10.8 but had a Content-Location like in 10.7?). Anyway, I think that a more formulaic description would be useful in the document in general but would also help facilitate understanding of when a digest (and any signature that uses the digest) can be checked with respect to the message itself (self-contained) or not.

I guess my skepticism of HTTP signatures in general comes from a personal belief that it's much more complicated/difficult than most folks think it will be and also likely/maybe doesn't provide what's expected. That's cynical and dependent on the context of expectations/requirements, of course, but I don't think it's entirely wrong.

There seems to be a number of disparate efforts underway to devise an HTTP signature scheme. And cynically again, I get the feeling that they've all started with someone thinking, "how hard can this be?" When someone in the OIDF FAPI WG suggested that maybe a small extension of some work I've been doing in the OAuth WG could maybe be a good enough approach, I fell into the same trap of thinking it could be easy. And now there's one more prospective HTTP signature scheme. It's an early draft and may not go anywhere but this http://lists.openid.net/pipermail/openid-specs-fapi/2021-January/002216.html kind of introduces it. It was trying to write proper reference and usage of the digest I-D (which seemed like a better idea than trying to invent something) in that draft that led me here.

LPardue commented 3 years ago

Please create some new threads or issues to continue the discussion.

If there are some concise or salient improvements we can make to improve draft, great. I'm just cautious of boiling the ocean trying to explain HTTP semantics

ioggstream commented 3 years ago

from the examples Content-Location is meaningful but not totally clear to me how in some cases ...

@bc-pi I think it is really impossible to explain all the HTTP semantics in this I-D, and also it wouldn't be correct as it would require to change this I-D at every semantic change. I am willing though to have a call with you for discussing the signature issue. Just drop me a line, you can find my email here https://lists.w3.org/Archives/Public/ietf-http-wg/2020OctDec/0283.html

HTTP signatures in general [is] much more complicated/difficult than most folks think

agree.

now there's one more prospective HTTP signature scheme

There are many: I'm following this thread since 2018 and I have played with various works, including draft-cavage, message-signatures, some openbanking work and currently an ETSI proposal based on eidas requirements.

This argument really requires a separate thread/conversation and I'm available for that, just let me know. In the meantime I agree with @LPardue not to continue on this thread :)

reschke commented 3 years ago

The HTTP semantics are not supposed to change :-).

What this spec needs to be clear about how HTTP semantics interact with how Digests are computed/checked.

In particular I agree with @bc-pi in that the spec needs to clearly describe what a Digest on a given HTTP message means, and how it can be validated.

ioggstream commented 3 years ago

The HTTP semantics are not supposed to change :-).

The Times They Are a-Changin' :P

how HTTP semantics interact with how Digests are computed/checked I agree with @bc-pi in that the spec needs to clearly describe what a Digest on a given HTTP message

Yes, I just do not agree this should be defined in terms of payload data: there was Content-MD5 for that, and if there's interest it might be extended to something like Content-Hash or Payload-Hash.

What @LPardue meant was just that this one is not the right issue for this discussion.

See #970 #1005