martinthomson / http-mice

A progressive integrity content encoding for HTTP
3 stars 2 forks source link

Use Digest header field #11

Closed martinthomson closed 6 years ago

martinthomson commented 6 years ago

The encoding would be different, but I can't see a reason not to use this:

Digest: MI=base64==

Thanks to @ioggstream.

jyasskin commented 6 years ago

Do you mean:

Digest: MI-SHA256=base64==

to allow forward-compatibility with future hash functions?

There's some mismatch between the semantics of the top-level mi-sha256 hash and the meaning of the Digest header. Specifically,

The digest is computed on the entire instance associated with the message. The instance is a snapshot of the resource prior to the application of any instance manipulation or transfer-coding (see section 3).

"Instance" is defined as

The entity that would be returned in a status-200 response to a GET request, at the current time, for the selected variant of the specified resource, with the application of zero or more content-codings, but without the application of any instance manipulations or transfer-codings.

Since mi-sha256 is a content-coding, which can appear anywhere in the list of content-codings, the MICE spec would have to be careful to say how this new kind of digest is checked.


Side note: I've been thinking of mi-sha256 as a content encoding that's parameterised by the top-level hash: only the message matching that hash successfully decodes. The question in this issue is then, how do we best communicate the parameter of a content encoding, given that https://tools.ietf.org/html/draft-ietf-httpbis-semantics-02#section-6.1.2 doesn't allow them to take parameters directly.

jyasskin commented 6 years ago

@mnot, if you know someone from the CDN world who might have opinions on this re-use of the Digest header, could you point them over here?

ioggstream commented 6 years ago

There's some mismatch between the semantics of the top-level mi-sha256 hash and the meaning of the Digest header....

As mi-sha256 depends on the block size, we should probably add the block size, eg.

MI-SHA256/1024=base64==

or

MI-SHA256=1024,base64==

In that way, mi-sha256 will be content-encoding independent.

Eg.

  1. Upload a file with MICE

    PUT /files/image.png
    Digest: MI-SHA256/1024=base64==
  2. log the mi-sha256/1024 header on the server

  3. Verify posthumously the integrity of the mi-sha256/1024

--

how do we best communicate the parameter of a content encoding, given that https://tools.ietf.org/html/draft-ietf-httpbis-semantics-02#section-6.1.2 doesn't allow them to take parameters directly.

@jyasskin (iiuc or forgive me and ignore) in #12 I suggest:

Which are the drawbacks of this approach?

Thanks to you all for your time!

martinthomson commented 6 years ago

The identification doesn't need to know the block size: if the block size that the digest assumes is wrong, the hash will simply not match.

ioggstream commented 6 years ago
jyasskin commented 6 years ago

@ioggstream Moving the record size to the header is not sufficient to make the top-level MI digest a digest of the "instance" that the Digest header says it holds digests of. MI assumes an intermediate hash for every record-size bytes, and those intermediate hashes have to be transmitted somewhere. If those intermediate hashes are transmitted in the header, then no content encoding is needed, and Digest directly applies, but FAQ 3. If the intermediate hashes are transmitted in the body, especially if a second content encoding is applied after mi-sha256, then the top-level hash is no longer a digest of the post-content-encoding instance.

ioggstream commented 6 years ago

@jyasskin

only mi-sha256 content-encoding

1- let the function m-hash: (payload, rs) -> (rs, top-level MI digest) 2- m-hash is based on all intermediate hashes 3- do you say you need to transmit all intermediate hases because 2 is a too weak condition for integrity?

multiple content-encoding

In case of multiple encodings you're right. Just found a clarifying thread on Digest and instances here. https://lists.w3.org/Archives/Public/ietf-http-wg/2018AprJun/0197.html

jyasskin commented 6 years ago

@martinthomson and I chatted, and we're going to make this change. I can't promise to get it done before I go on leave in a couple weeks, so someone else is welcome to pick it up, or I'll start on it in November.

ioggstream commented 6 years ago

Hi @jyasskin @martinthomson is there a summary of the discussion?

jyasskin commented 6 years ago

It was just that my concern (which was only a concern, not an objection) didn't bother Martin, and he has lots more experience dealing with HTTP headers.

mnot commented 6 years ago

The terminology that Jeff proposed in 3230 was never adopted in HTTP, so that spec probably needs to be revised. The closest thing to "instance" in current HTTP is selected representation. If that's the semantics you're looking for (i.e., you can send a Digest header w/MICE for the "whole" response on a 304 or a 206 and it still makes sense), you should be fine.

N.B. Content-Encoding is a property of the representation.

jyasskin commented 6 years ago

If we re-do Digest to cover the "selected representation", I think my worry above goes away. It's just an additional yak to shave. 😜

I can ask this on the list once I'm back from leave, but do you know offhand if anyone else is interested in updating RFC3230, vs if we're only doing it to support MICE?

mnot commented 6 years ago

Well, Content-MD5 is deprecated, and Digest so kind of the go-to now. It probably could use a refresh.

The only issue (besides finding time) is untangling it from delta encoding; I'm not sure how hard it would be to make it compatible with both modern HTTP and delta.

Until then, I don't think it does harm to use it.

ioggstream commented 6 years ago

@mnot:

mnot commented 6 years ago

It's not widely used. Revising it is a matter of specification work.

I don't think it's a big deal if you go ahead and use Digest without revising it*; we'll get to it eventually.

ioggstream commented 5 years ago

@mnot @martinthomson @jyasskin Me and @LPardue are trying to refresh Digest referencing rfc:7231 and make a cleaner RFC with some examples. Algos may reference mi-sha256 too.

Here's a gdoc. Feedback welcome! https://docs.google.com/document/d/1p8KBR_dQKfh7PgLTOYXg_htMAME8nnEblKRC0DSH5_o/edit?ts=5cb5aff0