JSON signing not appropriate for interoperability

matrix-org / matrix-spec

The Matrix protocol specification

Apache License 2.0

182 stars 94 forks source link

JSON signing not appropriate for interoperability #1248

Open neilalexander opened 1 year ago

neilalexander commented 1 year ago

We currently attempt to hash and/or sign Canonical JSON in a number of places. Implementations (both clients and servers) are expected to re-canonicalise JSON that they receive in order to verify those hashes or signatures, but there are a number of places where JSON implementations might vary in their behaviours, i.e.

numeric representations — #1244
duplicate key ordering — #1246
unicode — #1245

As such, unless every implementation is also able/willing to implement their own JSON encoders/decoders to match very specific differences in behaviour without also introducing new bugs, we run into a problem where different implementations written in different languages are going to arrive at different ideas of what a "canonicalised" JSON blob looks like.

These differences in encoding will result in different hashes or signatures, causing genuine things to be rejected as invalid. Guaranteeing 100% interoperability between any two implementations written in different languages is practically impossible as a result.

We have already seen this happen quite often in Dendrite/GMSL because of how GMSL specifically does not round-trip values when canonicalising JSON (instead just stripping whitespace and sorting keys), but this differs to the full round-tripping of the entire JSON that Synapse does, so we end up with unexpected signature errors.

Go's JSON library does not provide equivalents to sort_keys or ensure_ascii either, and it isn't clear that it would demonstrate the same behaviours as Python even if it did. The same types of problems will exist in other implementations or language standard libraries.

Ideally we need one of two things:

Easy (well, easier?): To change the definition of "Canonical JSON" to not round-trip/interpret/reformat values at all, but instead to only strip whitespace and sort keys while preserving value bytes as-is — this is currently what GMSL does;
Hard: To not use JSON at all and instead move to something where we can preserve the wire representation exactly, have signatures out-of-band and not need to canonicalise in order to verify them.

(created from #1232)

jplatte commented 1 year ago

I think going the GMSL way is also going to be hard in other languages. I can imagine one way to do it in Rust without writing my own JSON parser, but it's really not at all straight-forward and I don't expect Python's JSON library to provide anything that lets you even strip out individual fields while keeping the exact format of others (and it might get harder if you want to strip whitespace inside object / array literals). I agree that this solution might be easier than the second though.

neilalexander commented 1 year ago

I think going the GMSL way is also going to be hard in other languages. I can imagine one way to do it in Rust without writing my own JSON parser

It's hard in GMSL too — we had that pain in the past, implemented by hand because Go's JSON library provides nothing like Python's sort_keys or ensure_ascii:

jplatte commented 1 year ago

Maybe you should update the "Easy:" in the issue description then 😉

richvdh commented 1 year ago

Easy (well, easier?): To change the definition of "Canonical JSON" to not round-trip/interpret/reformat values at all, but instead to only strip whitespace and sort keys while preserving value bytes as-is — this is currently what GMSL does;

I think what you're proposing here is: rather than taking the received JSON, decoding it, and then re-encoding it as Canonical JSON, assume it is already (more-or-less) in the format in which it was signed, so just check the signature against the received bytes.

It's not clear to me that we actually need to specify a "canonical" format at all in that case. We're just signing bytes, right? (Glossing over the fact that the signature is embedded in the JSON, which complicates things.) Either way, it's a completely different approach.

richvdh commented 1 year ago

I think what you're proposing here is: rather than taking the received JSON, decoding it, and then re-encoding it as Canonical JSON, assume it is already (more-or-less) in the format in which it was signed, so just check the signature against the received bytes.

Also: doesn't this still present the problem that different implementations might interpret the same JSON differently (eg: in the case of duplicate keys, some impls might honour the first key, and some the last)?

neilalexander commented 1 year ago

I think what you're proposing here is: rather than taking the received JSON, decoding it, and then re-encoding it as Canonical JSON, assume it is already (more-or-less) in the format in which it was signed, so just check the signature against the received bytes.

I think we will still need the sorting of keys and the stripping of whitespace as a minimum. We still have to be able to modify the JSON to strip or insert the "unsigned" and "signatures" keys and know that we can still agree on a "correct" ordering after having done so. The main difference would be that we should leave the values well and truly alone — the wire format that a value is received in should be exactly what the signature covers, not the round-tripped value.

I'm mostly thinking this because, in Go for example, we can decode JSON without parsing the values (using json.RawMessage), but doing so can still change the order of the keys even if it leaves the values alone. When encoding, the order of the keys in the JSON obeys the struct field ordering, not lexicographical ordering. As a result we always have to sort keys ourselves manually after encoding as a separate step because we have no sort_keys equivalent when encoding.

I would fully expect plenty other JSON implementations to have their own unique ordering behaviours too which will need to be worked around by a post-encoding sort step either way.

Also: doesn't this still present the problem that different implementations might interpret the same JSON differently (eg: in the case of duplicate keys, some impls might honour the first key, and some the last)?

If you were to rely on round-tripping via a JSON implementation to sort keys and strip whitespace then possibly (although round-tripping is admittedly what got us into this mess to begin with).

GMSL for example takes a different approach to that by not round-tripping and instead manipulating JSON by hand to sort keys and strip whitespace, because that was the only thing we could do to make up for the lack of sorting in encoding/json.

richvdh commented 1 year ago

I think we will still need the sorting of keys and the stripping of whitespace as a minimum. We still have to be able to modify the JSON to strip or insert the "unsigned" and "signatures" keys and know that we can still agree on a "correct" ordering after having done so.

Do we? I can imagine an algorithm in which we strip out those keys without properly decoding and reencoding the JSON.

If you were to rely on round-tripping via a JSON implementation to sort keys and strip whitespace then possibly

I'm confused, on two fronts.

Doesn't your comment above about needing to specify the sorting of keys imply that you are relying on round-tripping via a JSON implementation?
Even if you don't rely on that, you could still have a situation where two implementations consider the same event to have very different meanings.

neilalexander commented 1 year ago

Do we? I can imagine an algorithm in which we strip out those keys without properly decoding and reencoding the JSON.

That's perhaps true but it may not be very future-proof. If an implementation wanted to shortcut in that way then it shouldn't matter as long as the end result is still right.

Doesn't your comment above about needing to specify the sorting of keys imply that you are relying on round-tripping via a JSON implementation?

GMSL can and does canonicalise JSON without round-tripping it through the encoding/json encoder.

The SortJSON function works by scanning the structure of the JSON (the gjson package works effectively by fancy string operations) and allowing us to swap keys around into the correct order. Then CompactJSON strips the whitespace using character-by-character manipulation afterwards.

The encoding/json package doesn't have an option like sort_keys, so all round-tripping would do for us in Go is to put it into the order that the fields appear in the struct, which isn't any more helpful.

Even if you don't rely on that, you could still have a situation where two implementations consider the same event to have very different meanings.

That's really just a spec precision issue. If the canonicalisation has to be implemented by hand to make it possible for all of us to agree on something, then we just need to be incredibly precise on what we're agreeing on — what constitutes the correct lexicographical ordering, the precedence for duplicate keys, that values are to be left alone, that very specific whitespace characters are to be stripped and so on.

richvdh commented 1 year ago

Do we? I can imagine an algorithm in which we strip out those keys without properly decoding and reencoding the JSON.

That's perhaps true but it may not be very future-proof. If an implementation wanted to shortcut in that way then it shouldn't matter as long as the end result is still right.

Doesn't your comment above about needing to specify the sorting of keys imply that you are relying on round-tripping via a JSON implementation?

GMSL can and does canonicalise JSON without round-tripping it through the encoding/json encoder.

The SortJSON function works by scanning the structure of the JSON (the gjson package works effectively by fancy string operations) and allowing us to swap keys around into the correct order. Then CompactJSON strips the whitespace using character-by-character manipulation afterwards.

mmkay, though I'd argue that the difference between this approach and a "real" round-trip is vanishingly small. Effectively you're reading the json with gjson and writing it out again with sortJSONObject. Sure you're not using the go standard library for it, but that's not relevant to the discussion.

Even if you don't rely on that, you could still have a situation where two implementations consider the same event to have very different meanings.

That's really just a spec precision issue.

Well, I guess so, but then it's unclear what problem you're hoping to solve in this issue. If we still have to spec everything incredibly tightly, what do we gain by replacing json signing with something else?