matrix-org / rust-synapse-compress-state

A tool to compress some state in a Synapse instance's database
https://pypi.org/project/synapse-auto-compressor/
Apache License 2.0
142 stars 32 forks source link

Meaning of panic messages "Missing {}" in `collapse_state_maps` is unclear #79

Open waclaw66 opened 2 years ago

waclaw66 commented 2 years ago

Can you please add more debug info to message "Missing {}"? E.g. which room is related. How to fix this issue? If affected room no longer exists, state events can be removed, right?

reivilibre commented 2 years ago

which room is related

The compressor only seems to do one specified room at once, so I would have thought it'd be the room you specified. (In a future where we might process multiple rooms: perhaps the code could be modernised to not panic, instead using anyhow and the .with_context function to add the room ID as context to any potential error?)

How to fix this issue?

That's a more of a question. Missing {} appears in two places, seemingly when it can't find a state group that it expects to find. I don't fully understand the logic here though. I would suspect it's either a bug in the state compressor or corruption (missing rows?) in your database.

If affected room no longer exists, state events can be removed, right?

I don't think that this error necessarily means the room is missing.

waclaw66 commented 2 years ago

Forgot to mention, I'm using synapse_auto_compressor, there is no room specified. Therefore I don't know what's the origin of missing state group. If the database is corrupted, I'm looking for a solution how to fix it.

thread 'unnamed' panicked at 'Missing 90660': src/lib.rs:651
Traceback (most recent call last):
  File "/root/matrix/./auto-compress", line 5, in <module>
    synapse_auto_compressor.compress_state_events_table(
pyo3_runtime.PanicException: Missing 90660
reivilibre commented 2 years ago

Oh I see, I wasn't aware of synapse_auto_compressor; that's a new addition since I last used this repo. Having that information seems like it would be useful so looks like it should be added.

In the meantime, you could find out what room is for a state group using SQL:

SELECT room_id FROM state_groups WHERE id = 90660;

As for figuring out whether it's a bug in the compressor or database corruption on your end, it would need investigation.

waclaw66 commented 2 years ago

Unfortunatally it doesn't find any room with that state group :(

bjo81 commented 2 years ago

Got a similar issue:

[2022-01-07T09:21:41Z INFO  synapse_auto_compressor] synapse_auto_compressor started
[2022-01-07T09:21:41Z INFO  synapse_auto_compressor::manager] Running compressor on room !qZFigcWoZoRODhzUlw:matrix.org with chunk size 500
[2022-01-07T09:21:41Z ERROR panic] thread 'main' panicked at 'Missing 5817': src/lib.rs:666
schildbach commented 2 years ago

I've got this issue too:

[2022-06-16T02:02:00Z INFO  synapse_auto_compressor] synapse_auto_compressor started
[2022-06-16T02:02:00Z INFO  synapse_auto_compressor::manager] Running compressor on room !sgvHsecpjxfyPEWmiK:fosdem.org with chunk size 500
[2022-06-16T02:02:00Z ERROR panic] thread 'main' panicked at 'Missing 376285': src/lib.rs:666
nycterent commented 2 years ago

Same here:

synapse_auto_compressor -p postgresql:// -c 500 -n 200
[2022-07-24T07:43:11Z INFO synapse_auto_compressor] synapse_auto_compressor started
[2022-07-24T07:43:11Z INFO synapse_auto_compressor::manager] Running compressor on room !GtIfdsfQtQIgbQSxwJ:archlinux.org with chunk size 500
[2022-07-24T07:43:16Z ERROR panic] thread 'main' panicked at 'Missing 37272': src/lib.rs:665
uriesk commented 2 years ago

Rather than having a descriptive error message, i would prefer it if synapse_auto_compressor still continues with the other rooms.

uriesk commented 2 years ago

Unfortunatally it doesn't find any room with that state group :(

You can run it with RUST_LOG=debug ./synapse_auto_compressor ...

Which gives you some error like:

[2022-07-27T11:32:18Z DEBUG tokio_postgres::prepare] preparing query s29: 
            SELECT target.prev_state_group, source.prev_state_group, state.type, state.state_key, state.event_id
            FROM state_group_edges AS target
            LEFT JOIN state_group_edges AS source ON (target.prev_state_group = source.state_group)
            LEFT JOIN state_groups_state AS state ON (target.prev_state_group = state.state_group)
            WHERE target.prev_state_group = ANY($1)

[2022-07-27T11:32:18Z DEBUG tokio_postgres::query] executing statement s29 with parameters: [[75275]]
[2022-07-27T11:32:18Z ERROR panic] thread 'main' panicked at 'Missing 77986': src/lib.rs:665

The 75275 here is the id for select room_id from state_groups where id = 75275; and you can then delete compressor_states for it with

delete from state_compressor_state where room_id = '[roomid]';
delete from state_compressor_progress where room_id = '[roomid]';

which should make it work again

maltee1 commented 1 year ago

I got the following:

[2022-10-24T20:32:52Z DEBUG tokio_postgres::prepare] preparing query s11: 
            SELECT m.id, prev_state_group, type, state_key, s.event_id
            FROM state_groups AS m
            LEFT JOIN state_groups_state AS s ON (m.id = s.state_group)
            LEFT JOIN state_group_edges AS e ON (m.id = e.state_group)
            WHERE m.room_id = $1 AND m.id <= $2
         AND m.id > $3
[2022-10-24T20:32:52Z DEBUG tokio_postgres::query] executing statement s11 with parameters: ["!dlPhxKlwObeRkhUZCm:maltee.de", 969, 963]
[2022-10-24T20:32:52Z DEBUG synapse_compress_state::database] Got initial state from database. Checking for any missing state groups...
[2022-10-24T20:21:34Z DEBUG tokio_postgres::prepare] preparing query s12: 
            SELECT target.prev_state_group, source.prev_state_group, state.type, state.state_key, state.event_id
            FROM state_group_edges AS target
            LEFT JOIN state_group_edges AS source ON (target.prev_state_group = source.state_group)
            LEFT JOIN state_groups_state AS state ON (target.prev_state_group = state.state_group)
            WHERE target.prev_state_group = ANY($1)

[2022-10-24T20:21:34Z DEBUG tokio_postgres::query] executing statement s12 with parameters: [[968]]
[2022-10-24T20:21:34Z ERROR panic] thread 'main' panicked at 'Missing 963': src/lib.rs:673

but

synapse=# select room_id from state_groups where id = 968;
 room_id 
---------
(0 rows)

How should I proceed?

edit: I deleted the room !dlPhxKlwObeRkhUZCm:maltee.de from the statement before, that worked.

MacLemon commented 1 year ago

Similar failure here.

[2022-12-04T16:58:16Z DEBUG tokio_postgres::query] executing statement s10 with parameters: [[889284, 889151, 884628]]
[2022-12-04T16:58:16Z DEBUG tokio_postgres::prepare] preparing query s11:
            SELECT m.id, prev_state_group, type, state_key, s.event_id
            FROM state_groups AS m
            LEFT JOIN state_groups_state AS s ON (m.id = s.state_group)
            LEFT JOIN state_group_edges AS e ON (m.id = e.state_group)
            WHERE m.room_id = $1 AND m.id <= $2
         AND m.id > $3
[2022-12-04T16:58:16Z DEBUG tokio_postgres::query] executing statement s11 with parameters: ["!TdAwENXmXuMrCrFEFX:maunium.net", 990391, 889284]
[2022-12-04T16:58:16Z DEBUG synapse_compress_state::database] Got initial state from database. Checking for any missing state groups...
[2022-12-04T16:58:16Z DEBUG tokio_postgres::prepare] preparing query s12:
            SELECT target.prev_state_group, source.prev_state_group, state.type, state.state_key, state.event_id
            FROM state_group_edges AS target
            LEFT JOIN state_group_edges AS source ON (target.prev_state_group = source.state_group)
            LEFT JOIN state_groups_state AS state ON (target.prev_state_group = state.state_group)
            WHERE target.prev_state_group = ANY($1)

[2022-12-04T16:58:16Z DEBUG tokio_postgres::query] executing statement s12 with parameters: [[889469, 890123, 890652, 890744, 891152, 893184, 893508, 893555, 893600, 894390, 895215, 896391, 896488, 897344, 899304, 899360, 899667, 899840, 903408, 903438, 903552, 904000, 904256, 908545, 909088, 909216, 912452, 912512, 912528, 912944, 913272, 916512, 917384, 917414, 917470, 917477, 917488, 917717, 919072, 919200, 921216, 921280, 921340, 921440, 921920, 922584, 922944, 924192, 925477, 925760, 926432, 926592, 926824, 927540, 927616, 927840, 928820, 929429, 930520, 930897, 931218, 931341, 931424, 931760, 931936, 932440, 932992, 933488, 934016, 934256, 934424, 934520, 934560, 936032, 936728, 936832, 937016, 937478, 937601, 937625, 937756, 937804, 938016, 938219, 938296, 938688, 938872, 938936, 939200, 939352, 939680, 940194, 940211, 942672, 942928, 942987, 943055, 943073, 943174, 943178, 943232, 943318, 943674, 943786, 943808, 943896, 943940, 943980, 943984, 944000, 944130, 944331, 944388, 944425, 944431, 944458, 944496, 944512, 944596, 944698, 944736, 944889, 944896, 945039, 945107, 945240, 945331, 945344, 945360, 945428, 945568, 945802, 946464, 946952, 947200, 947286, 947328, 947392, 947520, 948096, 948524, 949514, 949890, 949952, 950336, 950604, 951168, 951960, 953464, 953560, 953776, 953784, 954045, 956064, 958752, 960192, 961200, 962976, 963430, 963520, 963604, 963808, 964352, 964528, 964601, 964648, 964670, 964677, 965600, 966869, 967914, 967984, 968000, 968399, 969984, 970088, 972364, 972720, 972768, 972792, 973225, 974065, 975842, 975936, 977432, 977560, 979368, 979720, 983695, 984464, 984480, 984753, 984760, 984768, 984863, 984878, 985728, 986424, 986648, 986703, 987069, 987212, 987316, 989916]]
[2022-12-04T16:58:16Z ERROR panic] thread 'main' panicked at 'Missing 889151': src/lib.rs:665

Using synapse_auto_compressor release tagged v0.1.3. (Built with cargo 1.65.0 on FreeBSD 13.1-REALEASE-p5.)

verymilan commented 1 year ago

Is it safe for the functionallity of the autocompressor to just drop those tables?

By the way: it seems like usually its the last INFO log room, so there is no need to find it by that nunber.

Ralith commented 1 year ago

I encountered this crash while deleting a room concurrently with a run of synapse_auto_compressor. https://github.com/matrix-org/rust-synapse-compress-state/issues/79#issuecomment-1196618569 allows the compressor to resume. Perhaps that should be done automatically?

ZhenyaPav commented 1 year ago

Using

DELETE FROM state_compressor_state;
DELETE FROM state_compressor_progress;

fixed the issue for me.

Ralith commented 1 year ago

This seems to happen any time a room is deleted, not just when the deletion is concurrent.