matrix-org / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://matrix-org.github.io/synapse
Apache License 2.0
11.83k stars 2.13k forks source link

Event creation rate limiting happens very late in a worker environment #16481

Open richvdh opened 1 year ago

richvdh commented 1 year ago

When event persistence happens in a separate worker process to event creation, we don't end up applying the ratelimit until the event reaches the event persister.

This means we end up doing lots of work (including calculating the push actions and inserting into event_push_actions_staging) for an event that gets thrown away.

There is code to attempt to apply the ratelimiter upfront (https://github.com/matrix-org/synapse/blob/v1.93.0/synapse/handlers/message.py#L1001-L1002), but it doesn't work in a worker environment because the count is never updated.

erikjohnston commented 1 year ago

One option I think is to add the "full" ratelimit check (with update=True) just before we do the HTTP replication call to the event persister. c.f. below for the call that happens on the persisters

https://github.com/matrix-org/synapse/blob/166ffc0f23419bc99d9597fe95deaae3bbee7caf/synapse/handlers/message.py#L1715-L1717

richvdh commented 1 year ago

One option I think is to add the "full" ratelimit check (with update=True) just before we do the HTTP replication call to the event persister. c.f. below for the call that happens on the persisters

Yeah, that would work I guess.

Can we just use a separate ratelimiter for the two checks rather than trying to be "clever" ?

(ideally, request rate would be shared between workers (via redis?) which would solve this problem, as well as the fact that, for requests that get round-robinned between workers, you need to tune your ratelimits for the number of workers. But that's a bunch of work.)

erikjohnston commented 1 year ago

Can we just use a separate ratelimiter for the two checks rather than trying to be "clever" ?

I don't follow this? Do you mean not having the update=False thing? Though TBH I can't remember the context of why we only want to actually ratelimit quite late in the day?

richvdh commented 1 year ago

I don't follow this? Do you mean not having the update=False thing?

Yes, in short. We have two instances of the request ratelimiter, one of which we check (and update) when the request first arrives; the other we check and update later.

Though TBH I can't remember the context of why we only want to actually ratelimit quite late in the day?

No, nor me. We should probably find out before changing things here.