How does signature verification work in pre-v6 rooms

Benjamin-L commented 2 weeks ago

Link to problem area:

Signing JSON, from the v1.12 spec.

Issue

The spec says

Events in room versions 1, 2, 3, 4, and 5 might not be fully compliant with these restrictions. Servers SHOULD be capable of handling JSON which is considered invalid by these restrictions where possible.

The most notable consideration is that integers might not be in the range specified above.

Then the "signing details" subsection says

JSON is signed by encoding the JSON object without signatures or keys grouped as unsigned, using the canonical encoding described above.

Then the "checking for a signature" subsection says

Encodes the remainder of the JSON object using the Canonical JSON encoding.

Checks the signature bytes against the encoded object using the verification key. If this fails then the check fails. Otherwise the check succeeds.

I read this as "there is no requirement that canonical JSON encoding is used when signing events in pre-v6 rooms, and servers must be able to verify events that were originally signed using any JSON encoding". This is a problem, because a compliant server would need to iterate over all allowed encodings when checking signatures and can only reject an event after testing all of them. There are an unlimited number of encoding allowed by the current spec.

My outside-of-the-spec-text understanding is that the real requirement on encoding for pre-v6 rooms is "exactly what synapse does", which is something based on python's stdlib json encoder. I do not know whether this encoding is consistent between synapse versions, python versions, or pre-v6 room versions.

richvdh commented 2 weeks ago

To be clear: the problem with v5 rooms and earlier is that they allow JSON fields in events to contain values which are numbers outside the range [-(2**53)+1, (2**53)-1], or which have a floating-point component. Synapse accepts such values, and converts them back to JSON (following most of the canonicaljson rules, but forgetting about the limitations on numbers) to calculate the signature. This affects the signature on the event itself, as well as the signature on any federation requests which carry the event in the request payload (in particular, PUT /_matrix/federation/v1/send/{txnId}).

We don't currently have a good name for this pseudo-canonicaljson that Synapse produces; in any case, the question here is what exactly that format is.

My outside-of-the-spec-text understanding is that the real requirement on encoding for pre-v6 rooms is "exactly what synapse does", which is something based on python's stdlib json encoder.

That's about right, unfortunately, though it's not to say we can't make an attempt at documenting the behaviour, if we can figure out what that behaviour is.

It's not hard to figure out the behaviour with nothing more complicated than a Python REPL. For example:

python
>>> import json
>>>
>>> # Very big integer
>>> json.dumps(json.loads("10000000000000000000000000000000000000001"))
'10000000000000000000000000000000000000001'
>>>
>>> # float
>>> json.dumps(json.loads("1.0"))
'1.0'
>>>
>>> # big (ish) float
>>> json.dumps(json.loads("10000000000000000.0"))
'1e+16'
>>>
>>> # limit of precision in float
>>> json.dumps(json.loads("0.10000000000000001"))
'0.1'

So:

I do not know whether this encoding is consistent between synapse versions, python versions, or pre-v6 room versions.

Nor do we, really, but I believe it is consistent between Synapse versions and pre-v6 room versions. (Synapse used to accept other values like NaN and Infinity, but that hasn't been true for a long time.)

It's possible that some very old python versions did weird things with floats (particularly, I don't think the transition to exponential format always happened at 1e+16). Again, I don't think we need to worry about them.

clokep commented 2 weeks ago

It isn't exactly the same, but we have documented some historical oddities due to Synapse implementation. This could require delving a bit into the Python implementation to see what it does in different situations, unfortunately.

Benjamin-L commented 2 weeks ago

It's very good to hear that the deviation from canonical json is fairly limited! A couple clarifications:

It's possible that some very old python versions did weird things with floats (particularly, I don't think the transition to exponential format always happened at 1e+16). Again, I don't think we need to worry about them.

"don't think we need to worry about them" meaning that we would fully specify the encoding format for pre-v6 rooms, and if some synapse servers have different behavior due to an old python version, they would be non-compliant?

It isn't exactly the same, but we have documented some historical oddities due to Synapse implementation. This could require delving a bit into the Python implementation to see what it does in different situations, unfortunately.

The issue mentioned in that link wouldn't affect signature validation, right? The choice of string/integer would be preserved when encoding for validation. Are you saying that there may be other oddities that do affect signatures that we would need to read through the implementation to find?

richvdh commented 2 weeks ago

"don't think we need to worry about them" meaning that we would fully specify the encoding format for pre-v6 rooms, and if some synapse servers have different behavior due to an old python version, they would be non-compliant?

Yes.

clokep commented 2 weeks ago

The issue mentioned in that link wouldn't affect signature validation, right? The choice of string/integer would be preserved when encoding for validation. Are you saying that there may be other oddities that do affect signatures that we would need to read through the implementation to find?

I'm saying there is precedent for adding this information into the spec. What I linked to is not related to signature validation (as long as you round-trip the encoding properly).

matrix-org / matrix-spec

How does signature verification work in pre-v6 rooms #1988