Open Fizzadar opened 2 years ago
Related to #11491
AIUI we'd lose the ability to have foreign keys between the device tables and the rest of the database, which is unfortunate (not that we seem to have any).
Do you know if the database IO for those tables is mostly reads or writes? If they're mostly reads I'd be in favour of adding support for read replicas.
It’s a pretty even mix of reads and writes. Read replicas would certainly be very helpful (across all of synapse) but I suspect would require significant changes both code and operationally to be supported.
The reason I suggested splitting our devices is because it seems like a logically separate group of tables (like state groups) that needn’t have any in-db joins even in the future.
Full context: as part of this same investigation I was considering how eventually the synapse events/rooms tables could be sharded by room id, which would in theory provide near infinite scale. Separating non room related tables is kind of a first step towards that.
HI @Fizzadar, this is something we discussed a bit as our team. We think its totally feasible, we just have a couple of reservations:
So, to move forwards here could you share more about what you're seeing? Ideally we'd like to know exactly which queries are using the IO, but not sure how granular your data is.
Thanks for looking into this @erikjohnston! I pulled some DB stats on the highest read tables, combining both table + index blocks together to get the following rates (prom query here also):
(sum by (relname) (rate(pg_statio_all_tables_heap_blks_read{app="synapse-postgres-exporter"} [7d])))
+ (sum by (relname) (rate(pg_statio_all_tables_idx_blks_read{app="synapse-postgres-exporter"} [7d])))
table | total | data | index |
---|---|---|---|
e2e_room_keys | 1270 | 8.50 | 1261 |
room_memberships | 2433 | 1229 | 1204 |
events | 1591 | 649 | 942 |
event_auth_chains | 1135 | 465 | 670 |
user_ips | 614 | 0.0322 | 614 |
state_groups_state | 1064 | 623 | 440 |
event_auth_chain_links | 778 | 469 | 309 |
event_json | 388 | 133 | 255 |
event_edges | 206 | 15.9 | 191 |
state_group_edges | 681 | 517 | 164 |
This aligns with other charts indicating that the e2e_room_keys
table is pretty heavy which would be the biggest gain in terms of splitting out read performance for us here. A large number of these were count queries that we've actually disabled (perhaps temporarily) as the results were unused in all the clients (this commit & this commit).
In regards to the general issue here though - our aim is ultimately to shard the events
& events_json
tables, plus whatever is needed to facilitate those. I think splitting out the device storage could be a small iteration towards that goal as a logical group of independent tables/APIs. It may make sense to bring additional tables/storage classes in as well.
So this morning I added a new index on room_id
to the e2e_room_keys
field which has all but stopped the reads on that index. Still think separating the datastores is useful for other reasons, but the IO issue we started with is fixed by the index (will make a PR for that when I have the time).
Description:
We are currently experimenting with different ways to scale out the synapse database in particular where it would be possible to divide tables amongst separate database instances, much like the state tables/datastore class.
Based on my analysis it should be possible to extract the following device/e2e related stores into a separate datastore instance:
Device*Store
(stores/main/devices.py
)DeviceInbox*Store
(stores/main/deviceinbox.py
)EndToEndKey*Store
(stores/main/end_to_end_keys.py
)ClientIp*Store
(stores/main/client_ips.py
)I picked these because they're fairly small overall/low inter-dependency and represent a high percentage of database IO on our instance (currently single database all tables).
Note: the one interdependency this misses is the
populate_monthly_active_users
call inclient_ips.py
which could becomeself.hs.get_datastores().device.populate_monthly_active_users(user_id)
.Is there any appetite for this? We can commit engineering time to implement this if so. Also keen to discuss any other groups of stores that may be suitable candidates.