Closed trishankkarthik closed 1 year ago
I don't like the ObjectHash option because it appears to require parsing untrusted data before signature verification.
Something like ObjectHash is more useful when you wish to serialize data in multiple formats. I would suggest using a more off-the-shelf scheme if you intend to deal only with JSON.
Maybe I'm overlooking something, but I fail to see how ObjectHash authenticates the data?
ObjectHash computes a nested hash tree somewhat analogous to a Merkle tree. Included in the calculation of the "Merkle root" for any particular nested object graph are the digests of all of the data in the object graph (sans the structural elements of e.g. JSON or proto serialization)
I noted CBOR mentioned on the "[tuf] C Implementation of the TUF on client side" thread.
One of the great failings of CBOR, IMO, is that JOSE and COSE signatures are completely distinct as they include the encoding as part of the data to be signed, and therefore do not permit transcoding of the data between formats while preserving the same signature.
This is, of course, the primary advantage of using something like ObjectHash.
IMO, the only freedom you gain from ObjectHash is the format to store the retrieved data in, as long as you are able to convert it to the same ObjectHash (that can be verified using the original signature). But, I agree, this can be a compelling argument for using ObjectHash.
Though, in practise, I'm mostly inclined to "just" store the original files as-is since I already need to ability to verify those signatures anyway. So I'm wondering how valuable it is to have the same signature for different representations of the same data.
Hey guys. I'm just jumping in on this to point out that we are experiencing headaches of sorts with canonical JSON too.
From the TUF spec (emphasis mine):
4.1. Metaformat All documents use a subset of the JSON object format, with floating-point numbers omitted. When calculating the digest of an object, we use the "canonical JSON" subdialect as described at http://wiki.laptop.org/go/Canonical_JSON
Canonical JSON is absolutely not a subdialect in that it allows \n
(a literal newline) in strings among other non-escaped characters or the fact that "strings" are just byte-arrays and don't have to be valid UTF-8. This has so far only been annoying when we are dealing with RSA keys. For example, an RSA key in JSON format would be:
{"keyvalue":"-----BEGIN RSA PUBLIC KEY-----\n.........\n-----END RSA PUBLIC KEY-----"}
The above is technically valid CJSON as well, but the correct way to express the above payload as CJSON would be:
{"keyvalue":"-----BEGIN RSA PUBLIC KEY-----
.........
-----END RSA PUBLIC KEY-----"}
This has lead to the problems of inconsistent encodings/decodings across language/library boundaries, especially in the case where TUF defines the keyid = sha256(cjson(pub_key_string))
. This means that in JSON land, the hash is applied over a \n
at each newline, but in CJSON land, it is applied over \
and then n
giving different key IDs.
Another thing is that it seems developers are abusing Content-Type: application/json
for CJSON encoded data which is incorrect. If CJSON is kept around, I'd like to see the spec mention something like "developers SHOULD NOT use application/json
and SHOULD use application/canonical-json
for HTTP responses" to remind everyone how they should actually encode/decode data.
I know you guys are working on ASN.1 as the encoding format, but I would like to cast my vote for fully ripping out CJSON as this has led to a number of pain points and still requires parsing of untrusted data.
Haven't looked into it closely, but what does everyone on this thread think about JWS?
Cc: @JustinCappos
Another very important consideration is backwards-compatibility, which is a big deal for clients who are already using TUF in production...
From the looks of it, a TUF usable implementation wouldn't work with JWS since you just concatenate (simplification) b64(payload) || '.' || b64(sig_value)
. The sig isn't structured, and whatever the solution would be would need to be more structured.
I imagine the ASN.1 would be:
Metadata ::={
signatures = SEQUENCE Signature,
signed = SEQUENCE byte
}
And then after the validation of the signed portion, the application parses the bytes into whatever structure it's supposed to be. This gets around the problem of parsing untrusted data since we're only treating the signed body as a byte stream until we know it's what we want.
For example:
bytes = requests.get('localhost:8080/targets.json').body
parsed_meta = asn1_parse(bytes)
if not verify(parsed_meta, root=root, snapshot=snapshot):
raise VerificationError('oh dear')
targets_meta = targets_parse(parsed_meta.signed)
return targets_meta
Yeah, unfortunately this doesn't address backwards compatibility, but that's what major version changes are for. :)
I'd just like to point out that technically TUF doesn't follow it's own spec in that when JSON is parsed it uses this
https://github.com/secure-systems-lab/securesystemslib/blob/master/securesystemslib/util.py#L844
And that just does json.load
which doesn't respect CJSON encodings.
Are you referring to this part of the specification?
When calculating the digest of an object, we use the "canonical JSON" subdialect as described at http://wiki.laptop.org/go/Canonical_JSON.
I think we do indeed canonicalize JSON metadata prior to generating digests. create_signature() canonicalizes the metadata here.
It seems I've missed the discussion about ASN.1, I only see some excerpts of it being mentioned a few years back. Is this something that is seriously being considered for TUF?
Anyways, the use of canonical JSON is still troublesome as mentioned again by @heartsucker. I think the biggest problem is the fact that the RSA (or whatever) keys are embedded in a format that does not play nice when used in combination with JSON. We could either replace the current public key format with something like JWK, base64 encode it once more, or define yet another custom encoding.
needless to say, I think the cleanest approach would be to use JWKs :)
What I mean is that function is used in the tuf.client.Updater._get_metadata_file
when it loads metadata. It is treating the loaded metadata as JSON and not as CJSON which means that if you parse RSA keys, you're going to run into the \n
v. \\
& n
problem I mentioned above.
For example, the file tests/repository_data/repository/metadata/1.root.json
is JSON and not CJSON, and so far as I can tell, there isn't anywhere in the code that decodes CJSON. So the tests are passing because you're turning the \n
into a literal newline, but if you parsed the file as CJSON, all the \n
's would be treated as \\
and n
which would break your hashes.
Loading JSON metadata happens separately from verifying signatures. We verify signatures in the expected CJSON format here, which should match what occurs when the signature is generated (linked in my previous comment).
I'm not against JWK, only explaining the approach used in the reference implementation.
I'm a big fan of objecthash for this purpose, especially preserving the structure of data and avoiding the whole can of worms that is canonicalization entirely:
https://github.com/benlaurie/objecthash
However, it's a bit ill-specified at the moment.
@tarcieri : I'm still pondering about how ObjectHash would solve the line ending issue of the public key format?
To solve that particular problem, I would suggest getting rid of the PEM wrapper and doing a simple Base64 encoding of the raw DER contained within the PEM.
On 19 April 2017 at 11:12, Jan Willem Janssen notifications@github.com wrote:
It seems I've missed the discussion about ASN.1, I only see some excerpts of it being mentioned a few years back. Is this something that is seriously being considered for TUF?
Yes, we're planning to support multiple data formats (e.g., JSON, ASN.1/DER, etc.). ASN.1/DER was primarily developed for @uptane. @awwad can say more about it...
@vladimir-v-diaz Ok, yes I see what you mean now. My mistake. I guess I was going down the wrong path because I couldn't calculate the same key IDs that are in the test cases:
$ jq -r '.signed.keys."5602f4df0cd26b2112f0833b1ce8d5fcbb595754961d3a04f37b9815e2ced503".keyval.public' tests/repository_data/repository/metadata/1.root.json | sha256sum
f592cc46408c6c29124203998fee89553889c02e50c94065d5f05f1dbf594871 -
$ jq '.signed.keys."5602f4df0cd26b2112f0833b1ce8d5fcbb595754961d3a04f37b9815e2ced503".keyval.public' tests/repository_data/repository/metadata/1.root.json | sed -e 's/"//g' | sha256sum
0e07775f044772a860647f827942df85370b0aa387a49df545a3dc0d26eebc9e -
and also
import hashlib, canonicaljson, json
with open('tests/repository_data/repository/metadata/1.root.json') as f:
j = json.loads(f.read())
b = canonicaljson.encode_canonical_json(j['signed']['keys']['5602f4df0cd26b2112f0833b1ce8d5fcbb595754961d3a04f37b9815e2ced503']['keyval']['public'])
h = hashlib.sha256()
h.update(b)
h.hexdigest()
# '32b01395a271d0f34ab8c7c47da3eaf0ae25e76a9ef44cd7b10257e473961529'
On 19 April 2017 at 11:59, heartsucker notifications@github.com wrote:
@vladimir-v-diaz https://github.com/vladimir-v-diaz Ok, yes I see what you mean now. My mistake. I guess I was going down the wrong path because I couldn't calculate the same key IDs that are in the test cases:
I suppose the least we can do now is to improve our documentation (with runnable code) on how to compute / verify signatures...
I'm jumping in late here, but want to mention we're certainly moving TUF so that it doesn't require a specific format (e.g., canonical-json). TAP 7 https://github.com/theupdateframework/taps/blob/tap7/tap7.md talks about some of our early thoughts in this area.
For conformance testing, you'll just have to write code that plugs into our test harness to translate into your format. The tests should all be able to run from there.
Note: folks on the Docker team and also some our collaborators in the automotive industry are encouraging us to accept TAP 7 soon. We will likely be moving forward fairly quickly, unless issues come up during discussion of the TAP.
On Wed, Apr 19, 2017 at 12:02 PM, Trishank Karthik Kuppusamy < notifications@github.com> wrote:
On 19 April 2017 at 11:59, heartsucker notifications@github.com wrote:
@vladimir-v-diaz https://github.com/vladimir-v-diaz Ok, yes I see what you mean now. My mistake. I guess I was going down the wrong path because I couldn't calculate the same key IDs that are in the test cases:
I suppose the least we can do now is to improve our documentation (with runnable code) on how to compute / verify signatures...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/theupdateframework/tuf/issues/362#issuecomment-295323487, or mute the thread https://github.com/notifications/unsubscribe-auth/AA0XDzlGAMILl5riL5WiEv9LrYmAfpYQks5rxjAHgaJpZM4JpHYi .
Closing since DSSE is on the horizon
@jawi reports:
@tarcieri comments: