Open Benjamin-L opened 2 weeks ago
To be clear: the problem with v5 rooms and earlier is that they allow JSON fields in events to contain values which are numbers outside the range [-(2**53)+1, (2**53)-1]
, or which have a floating-point component. Synapse accepts such values, and converts them back to JSON (following most of the canonicaljson rules, but forgetting about the limitations on numbers) to calculate the signature. This affects the signature on the event itself, as well as the signature on any federation requests which carry the event in the request payload (in particular, PUT /_matrix/federation/v1/send/{txnId}
).
We don't currently have a good name for this pseudo-canonicaljson that Synapse produces; in any case, the question here is what exactly that format is.
My outside-of-the-spec-text understanding is that the real requirement on encoding for pre-v6 rooms is "exactly what synapse does", which is something based on python's stdlib json encoder.
That's about right, unfortunately, though it's not to say we can't make an attempt at documenting the behaviour, if we can figure out what that behaviour is.
It's not hard to figure out the behaviour with nothing more complicated than a Python REPL. For example:
python
>>> import json
>>>
>>> # Very big integer
>>> json.dumps(json.loads("10000000000000000000000000000000000000001"))
'10000000000000000000000000000000000000001'
>>>
>>> # float
>>> json.dumps(json.loads("1.0"))
'1.0'
>>>
>>> # big (ish) float
>>> json.dumps(json.loads("10000000000000000.0"))
'1e+16'
>>>
>>> # limit of precision in float
>>> json.dumps(json.loads("0.10000000000000001"))
'0.1'
So:
I do not know whether this encoding is consistent between synapse versions, python versions, or pre-v6 room versions.
Nor do we, really, but I believe it is consistent between Synapse versions and pre-v6 room versions. (Synapse used to accept other values like NaN
and Infinity
, but that hasn't been true for a long time.)
It's possible that some very old python versions did weird things with floats (particularly, I don't think the transition to exponential format always happened at 1e+16). Again, I don't think we need to worry about them.
It isn't exactly the same, but we have documented some historical oddities due to Synapse implementation. This could require delving a bit into the Python implementation to see what it does in different situations, unfortunately.
It's very good to hear that the deviation from canonical json is fairly limited! A couple clarifications:
It's possible that some very old python versions did weird things with floats (particularly, I don't think the transition to exponential format always happened at 1e+16). Again, I don't think we need to worry about them.
"don't think we need to worry about them" meaning that we would fully specify the encoding format for pre-v6 rooms, and if some synapse servers have different behavior due to an old python version, they would be non-compliant?
It isn't exactly the same, but we have documented some historical oddities due to Synapse implementation. This could require delving a bit into the Python implementation to see what it does in different situations, unfortunately.
The issue mentioned in that link wouldn't affect signature validation, right? The choice of string/integer would be preserved when encoding for validation. Are you saying that there may be other oddities that do affect signatures that we would need to read through the implementation to find?
"don't think we need to worry about them" meaning that we would fully specify the encoding format for pre-v6 rooms, and if some synapse servers have different behavior due to an old python version, they would be non-compliant?
Yes.
The issue mentioned in that link wouldn't affect signature validation, right? The choice of string/integer would be preserved when encoding for validation. Are you saying that there may be other oddities that do affect signatures that we would need to read through the implementation to find?
I'm saying there is precedent for adding this information into the spec. What I linked to is not related to signature validation (as long as you round-trip the encoding properly).
Link to problem area:
Signing JSON, from the v1.12 spec.
Issue
The spec says
Then the "signing details" subsection says
Then the "checking for a signature" subsection says
I read this as "there is no requirement that canonical JSON encoding is used when signing events in pre-v6 rooms, and servers must be able to verify events that were originally signed using any JSON encoding". This is a problem, because a compliant server would need to iterate over all allowed encodings when checking signatures and can only reject an event after testing all of them. There are an unlimited number of encoding allowed by the current spec.
My outside-of-the-spec-text understanding is that the real requirement on encoding for pre-v6 rooms is "exactly what synapse does", which is something based on python's stdlib json encoder. I do not know whether this encoding is consistent between synapse versions, python versions, or pre-v6 room versions.