Clarify atomicity of endpoints

ShadowJonathan commented 2 years ago

Many endpoints in matrix update resources, however, the spec is currently silent on how servers should be behave with subsequent queries to the same resource, if it is being/has been updated.

The key question; Does the server ensure when it sends a HTTP response about updating a resource (i.e. PUT/DELETE), that any subsequent call to query that resource (GET) will reflect that last change?

ShadowJonathan commented 2 years ago

I think it should also be useful to clarify that atomicity would only be guaranteed for a local server. And that over federation, any delay is possible.

This issue would have consequences for multi-process homeservers, but i think it would be immensely useful to clarify if all of those processes would have to be "synced up" before a server returns a response, to ensure that if a following query is handled in a different worker, that worker returns an up-to-date version of the resource.

kegsay commented 2 years ago

To add to this from #complement:

The problem is that Matrix endpoints don't specify when data has propagated. Take for example a request to update the room name. You do a PUT to update the room name, and then to check that you do a GET to /state/m.room.name. If you do the GET too early you may not see the updated room name. Naively, many many many tests assume that if the PUT/POST request returns 200 OK then all GET endpoints will return the updated data: this intuitively makes sense but may not be true in a distributed architecture like Dendrite, where the GET endpoints are serviced by another component to the PUT endpoints - you need to wait until that event has propagated through the system - this is the source of the majority of integration test raciness. Ideally the specification would be explicit about the guarantees a 200 OK is making, to enforce that homeservers are in a certain agreed upon state, but currently the spec is silent on this which means you can get deviations

It's worth noting that Dendrite is actually fine for this example of updating the room name, and it's purely for illustrative purposes only - though this wasn't always the case (Dendrite used to have a currentstateserver which serviced GET requests for the current room name, which raced with clientapi which services PUT requests).

KitsuneRal commented 2 years ago

Tangentially - I wonder what would be a good enough point when it can be reasonably assumed (from the testing perspective in particular) that the change did trickle through. Seeing that change in the /sync response?

ShadowJonathan commented 2 years ago

Currently that is the approach that complement is taking, however, some resources might race by calling multiple endpoints one-after-the-other, or not want to call sync due to its (relatively) heavy processing requirements.

I'm not asking for the spec to make endpoints atomic, though I am asking for it to clarify its status, if sync should be used as a measure for the resource change finally coming through.

reivilibre commented 2 years ago

The word for this is atomicity, fwiw :)

matrix-org / matrix-spec

Clarify atomicity of endpoints #938