Open richvdh opened 4 years ago
There is a bunch of code inside synapse which tries to add an unspecced
age
field to events (see matrix-org/matrix-spec-proposals#684, matrix-org/matrix-spec-proposals#2685).
AFAICS this is specced. See https://spec.matrix.org/v1.2/rooms/v1/#event-format through https://spec.matrix.org/v1.2/rooms/v9/#event-format. The UnsignedData
type definition mentions an (optional?) age
field.
Some spec history:
age
was added as an example here: https://github.com/matrix-org/matrix-spec/commit/5027a9a59a4ca257f6b71f78f6f0fe3811940400 but in a way that makes it look like a formal part of the PDU spec.unsigned
in the PDU definition but kept the age.AFAICS this is specced. See https://spec.matrix.org/v1.2/rooms/v1/#event-format through https://spec.matrix.org/v1.2/rooms/v9/#event-format. The
UnsignedData
type definition mentions an (optional?)age
field.
Hrm, it seems like it might have got a bit more specced since this issue was first opened. Note that https://spec.matrix.org/v1.1/client-server-api/ doesn't mention age
outside of examples.
But yes, it seems pretty clear now.
FTR, this caused a bug on android, where verification requests were discarded as their age couldn't be determined (no age field other federation).
It's not really clear if the field is mandatory. As per spec a timestamp
field is mandatory for verification requests over to_devices
Required when sent as a to-device message. The POSIX timestamp in milliseconds for when the request was made. If the request is in the future by more than 5 minutes or more than 10 minutes in the past, the message should be ignored by the receiver.
Can't find any reference for verification via room messages, but the code was expected to use the age
from unsigned
to check for validity.
It's not really clear if the field is mandatory.
The spec does not mark age
as Required, so you should treat it as optional. (Though I sympathise and would like it if the spec said "Optional" explicitly.)
As per spec a timestamp field is mandatory for verification requests over to_devices. Can't find any reference for verification via room messages
I don't think it would make sense to do this on a room-by-room basis?
Also, it's really confusing that we have both an age and a timestamp.
I think I'm leaning towards "we should remove this from the spec and Synapse", unless anyone can explain how this is useful for clients?
Here is some information about where this gets touched meaningfully(AFAICT). FTR, I couldn't tease this behavior out of Complement anywhere during transmission, however I was able to find it exposed just before a simple_upsert
so I believe it's possible this is being written to the database(More below). I don't believe after it is corrected for that it will need a test specifically in Complement/Sytest or in unit tests for a regression. Fun fact: The bit of code that causes this is almost 9 years old.
FederationServer._handle_pdus_in_txn()
hereEventBase.get_pdu_json()
herecreate_local_event_from_event_dict()
hereTransactionManager.send_new_transaction().json_data_cb()
hereThe ones marked with checks are the ones I suspect. The rest are easy to discount, as they look to be doing what they are 'supposed' to do(provided that age_ts
is meant to be in unsigned
and not internal_metadata
).
(RECEIVING)
FederationServer._handle_pdus_in_txn()
), it appears that a PDU
comes in for processing and:
unsigned.age
field is found:
age
unsigned
age
at top-level is found
age_ts
age
is then deleted.age_ts
is created here and none of this occurs if the unsigned.age
field isn't found to begin with as a top-level age
is otherwise never created(As far as I could find/tell).(SENDING)
TransactionManager.send_new_transaction().json_data_cb()
),
age_ts
is found at top-level
age_ts
is mutated/copied from top-level into unsigned.age
(Potentially overwriting any existing value, however I don't think it does)age_ts
is then deleted.json_data_cb()
gets the data it operates on from EventBase.get_pdu_json()
. (for context, when EventBase.get_pdu_json()
is called from the TransactionManager.send_new_transaction()
, no 'timestamp' is passed in, so this conditional never gets called, which is why I don't believe it ever gets overwritten. Two ships passing in the night.) There:
unsigned.age_ts
exists:
unsigned.age
is formedunsigned.age_ts
is deletedHere is a log line demonstrating the age_ts
key just before it's passed into the SQL transaction(insert_received_event_to_staging()
):
I was able to format it with json.dumps()
for easier staring at. It goes into insert_received_event_to_staging
if you wish to find this in an existing log line.
There is a bunch of code inside synapse which tries to add an
unspeccedage
field to events.The intention of such a field is to try to mitigate problems with incorrectly-set clocks. Rather than saying "this event was created at 12:00", we can say "this event was created 10 minutes ago"; for certain applications (notably VoIP signalling), that information is more useful.
It would be more useful still if it was actually specced, but that's a side-issue.Considering a flow from client to client over federation, I think the intention of the implementation is this:
age_ts
.age_ts
from the event and replaces it with a rawage
field giving the number of milliseconds that have elapsed sinceage_ts
.age_ts
based on the time it received the transaction.age_ts
withage
.Assuming that a given server's clock is consistently inaccurate, it is kinda plausible, though obviously it neglects any delays between formatting transactions and them being received at the destination.
However, I suspect it doesn't actually work. The C-S code in synapse stores
age_ts
in theunsigned
property of the event (which is probably correctno, we haveinternal_metadata
for this sort of thing); however, the logic in the federation sender which tries to replaceage_ts
actually looks for the property in the top level of the event rather than inunsigned
. It therefore leavesage_ts
in place inunsigned
. The federation receiver logic will accept anage
in eitherunsigned
or at the top level, and replaces it with anage_ts
at the top level. The C-S code doesn't know to strip it out of there, which means it leaks out to the C-S API (see https://github.com/matrix-org/matrix-doc/issues/2685).This has clearly been broken for ages, and nobody has really noticed.
Given none of it is specced, I am inclined to say that we should strip it all out.(Update 2023/01/17: it's specced nowadays.)