matrix-org / rust-synapse-compress-state

A tool to compress some state in a Synapse instance's database
https://pypi.org/project/synapse-auto-compressor/
Apache License 2.0
142 stars 32 forks source link

synapse_auto_compressor panics on start after using the history delete api to remove remote events #78

Open MTRNord opened 2 years ago

MTRNord commented 2 years ago

Describe the bug

Nov 25 00:25:08 nordgedanken.dev systemd[1]: Starting State Compress Synapse...
Nov 25 00:25:08 nordgedanken.dev synapse_auto_compressor[3600348]: [2021-11-24T23:25:08Z INFO  synapse_auto_compressor] synapse_auto_compressor started
Nov 25 00:25:08 nordgedanken.dev synapse_auto_compressor[3600348]: [2021-11-24T23:25:08Z INFO  synapse_auto_compressor::manager] Running compressor on room !OGEhHVWSdvArJzumhm:matrix.org with chunk size 1000
Nov 25 00:25:10 nordgedanken.dev synapse_auto_compressor[3600348]: [2021-11-24T23:25:10Z ERROR panic] thread 'main' panicked at 'Missing 13719240': src/lib.rs:666
Nov 25 00:25:10 nordgedanken.dev systemd[1]: synapse-state-compress.service: Main process exited, code=exited, status=101/n/a
Nov 25 00:25:10 nordgedanken.dev systemd[1]: synapse-state-compress.service: Failed with result 'exit-code'.
Nov 25 00:25:10 nordgedanken.dev systemd[1]: Failed to start State Compress Synapse.

To Reproduce Steps to reproduce the behavior:

  1. Run purge_history on a room
  2. start synapse_auto_compressor

Expected behavior It runs or atleast ignores the invalid rooms

DMRobertson commented 2 years ago

Could you run this with the environment variable RUST_BACKTRACE=1 set, to have a better idea of where the panic comes from?

MTRNord commented 2 years ago

While I would love to it seems that is all I am getting even with the env variable set and in a debug binary:

➜  synapse_auto_compressor git:(main) export RUST_BACKTRACE=1
➜  synapse_auto_compressor git:(main) RUST_BACKTRACE=1 synapse_auto_compressor -p postgresql:// -c 1000 -n 10000000
[2021-11-25T18:23:14Z INFO  synapse_auto_compressor] synapse_auto_compressor started
[2021-11-25T18:23:14Z INFO  synapse_auto_compressor::manager] Running compressor on room !QQpfJfZvqxbCfeDgCj:matrix.org with chunk size 1000
[2021-11-25T18:23:14Z ERROR panic] thread 'main' panicked at 'Missing 13719470': src/lib.rs:666
DMRobertson commented 2 years ago

Thanks for trying that---I wonder what's happened there?

maranda commented 2 years ago

I'm not sure if this is helpful, but deleting the relevant room entries into state_compressor_state and state_compressor_progress. At least made the auto compressor resume working for me.

MTRNord commented 2 years ago

By the way I now tried the workaround of deleting it from the table. However that just makes it fail on missing some other position :/

Also I now know that for me the affected room is TWIM !QQpfJfZvqxbCfeDgCj:matrix.org (its however now changing the rooms it fails at)

maranda commented 2 years ago

Anytime that happens just repeat the previous step to delete references of the room from both tables and the compressor will resume.

You may have to do that multiple times.

MTRNord commented 2 years ago

Anytime that happens just repeat the previous step to delete references of the room from both tables and the compressor will resume.

You may have to do that multiple times.

Yeah but thats kinda annoying to do on an automated setup :) I will instead now have a look at a) changing to logging to get useful backtraces and b) take a look at handling this or atleast do the cleanup automatically instead of doing a panic here. :)

MTRNord commented 2 years ago

Ok after basically ripping out all the logging and replacing it with tracing I got some error backtrace thats actually useful:

Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]: The application panicked (crashed).
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]: Message:  Missing 13719624
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]: Location: src/lib.rs:664
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:                                 ⋮ 7 frames hidden ⋮
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:    8: synapse_compress_state::collapse_state_maps::hb54811a6b298273a
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:    9: synapse_compress_state::compressor::Compressor::create_new_tree::hb58a7e827165ecc1
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   10: synapse_compress_state::compressor::Compressor::compress_from_save::he9f32ebae7ad0814
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   11: synapse_compress_state::continue_run::h5d2aef7600ce0d7e
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   12: synapse_auto_compressor::manager::compress_chunks_of_database::hcd09dc2919f42bcc
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   13: synapse_auto_compressor::main::h824a83cfc8787e04
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   14: std::sys_common::backtrace::__rust_begin_short_backtrace::h6d40e1be24292234
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   15: std::rt::lang_start::{{closure}}::h4f8c9309335fe60f
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   16: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hbf4780e8c8411565
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/core/src/ops/function.rs:280
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   17: std::panicking::try::do_call::h99aa9194bd8ee1c4
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/panicking.rs:492
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   18: std::panicking::try::h8770f2322ba2c827
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/panicking.rs:456
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   19: std::panic::catch_unwind::ha42b10472f01a333
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/panic.rs:137
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   20: std::rt::lang_start_internal::{{closure}}::hd9207fca072adbba
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/rt.rs:128
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   21: std::panicking::try::do_call::ha914fc4c35b287ee
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/panicking.rs:492
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   22: std::panicking::try::h64ca443a8c434489
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/panicking.rs:456
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   23: std::panic::catch_unwind::h91c34d32b5cf2663
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/panic.rs:137
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   24: std::rt::lang_start_internal::he544a68b08a56a71
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/std/src/rt.rs:128
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   25: main<unknown>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   26: __libc_start_main<unknown>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:   27: _start<unknown>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]:       at <unknown source file>:<unknown line>
Apr 18 14:36:57 nordgedanken.dev synapse_auto_compressor[1283991]: Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.

This is running on https://github.com/MTRNord/rust-synapse-compress-state/tree/c9d71bba353f0d68d5818164db2b117b4f07dcd7 which is a dirty fix to get useful logs out of this.

Maybe this helps debugging this further.

MTRNord commented 2 years ago

(See hidden comment for minor more info)

Proper debug stack on where it crashes:

Apr 18 14:41:18 nordgedanken.dev synapse_auto_compressor[1461640]: 2022-04-18T12:41:18.278082Z  INFO synapse_auto_compressor::manager: Running compressor on room !zXfJBqSUvXySmsZMtB:jki.re with chunk size 1000
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]: The application panicked (crashed).
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]: Message:  Missing 13719624
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]: Location: src/lib.rs:664
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:                                 ⋮ 7 frames hidden ⋮
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:    8: synapse_compress_state::collapse_state_maps::h44a9bbb9616014eb
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/src/lib.rs:664
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        662 │ fn collapse_state_maps(map: &BTreeMap<i64, StateGroupEntry>, state_group: i64) -> StateMap<Atom> {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        663 │     if !map.contains_key(&state_group) {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        664 >         panic!("Missing {}", state_group);
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        665 │     }
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        666 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:    9: synapse_compress_state::compressor::Compressor::get_delta::he6c004a548606169
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/src/compressor.rs:267
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        265 │         let mut prev_state_map;
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        266 │         'outer: loop {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        267 >             prev_state_map = collapse_state_maps(self.original_state_map, prev_sg);
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        268 │             for (t, s) in prev_state_map.keys() {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        269 │                 if !state_map.contains_key(t, s) {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   10: synapse_compress_state::compressor::Compressor::create_new_tree::h46dc928a5f4d15ab
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/src/compressor.rs:228
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        226 │             } else {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        227 │                 self.stats.state_groups_changed += 1;
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        228 >                 self.get_delta(prev_state_group, state_group)
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        229 │             };
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        230 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   11: synapse_compress_state::compressor::Compressor::compress_from_save::hb4323518edfb5fa4
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/src/compressor.rs:169
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        167 │         };
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        168 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        169 >         compressor.create_new_tree();
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        170 │         compressor
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        171 │     }
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   12: synapse_compress_state::continue_run::hdaabfe372f1e3a79
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/src/lib.rs:570
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        568 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        569 │     // Now we actually call the compression algorithm.
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        570 >     let compressor = Compressor::compress_from_save(&state_group_map, level_info);
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        571 │     let new_state_group_map = &compressor.new_state_group_map;
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        572 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   13: synapse_auto_compressor::manager::run_compressor_on_room_chunk::h1a04765da5e58a53
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/synapse_auto_compressor/src/manager.rs:61
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:         59 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:         60 │     // run the compressor on this chunk
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:         61 >     let option_chunk_stats = continue_run(start, chunk_size, db_url, room_id, &level_info);
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:         62 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:         63 │     if option_chunk_stats.is_none() {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   14: synapse_auto_compressor::manager::compress_chunks_of_database::h12736a1fee37efa6
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/synapse_auto_compressor/src/manager.rs:171
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        169 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        170 │         let work_done =
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        171 >             run_compressor_on_room_chunk(db_url, &room_to_compress, chunk_size, default_levels)?;
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        172 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        173 │         if let Some(ref chunk_stats) = work_done {
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   15: synapse_auto_compressor::main::ha895a792595d0dd7
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /root/.cargo/git/checkouts/rust-synapse-compress-state-68477a18a6ccb974/95206e2/synapse_auto_compressor/src/main.rs:174
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        172 │     // call compress_largest_rooms with the arguments supplied
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        173 │     // panic if an error is produced
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        174 >     manager::compress_chunks_of_database(db_url, chunk_size, &default_levels.0, number_of_chunks)?;
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        175 │
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:        176 │     info!("synapse_auto_compressor finished");
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:   16: core::ops::function::FnOnce::call_once::hf6e65f2765cf1ff7
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:       at /rustc/ec77f252434a532fdb5699ae4f21a3072d211edd/library/core/src/ops/function.rs:248
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]:                                 ⋮ 15 frames hidden ⋮
Apr 18 14:42:05 nordgedanken.dev synapse_auto_compressor[1461640]: Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.

This is using this forked version (replacing the logging functions mainly: https://github.com/MTRNord/rust-synapse-compress-state/tree/95206e295599973bc676c32ec4639852830ef4c0 )

MTRNord commented 2 years ago

Also, https://github.com/MTRNord/rust-synapse-compress-state/commit/940aba6d52b9171da65787af5cee1f6707e36a07 contains experimental error handling for this. It will essentially just skip the room in case of this error entirely. It won't do any cleanup currently, but at least not panic. Furthermore, it's not tested and potentially breaking stuff at this time. Hence, the warning below.

The potentially incorrect and problematic changes are here: https://github.com/MTRNord/rust-synapse-compress-state/commit/940aba6d52b9171da65787af5cee1f6707e36a07#diff-aecc1fa342e82a5188eb5d3fce02ecc1adc1c82ae413ed814a6f9f362d5cba55R229-R236

As I am not sure if it is safe to break there.

DO NOT USE THE ABOVE COMMIT! (It may very well break your synapse as it changes stuff in the actual logic of the algo)

Edit: There indeed was a bug, and it still would have committed the new (now broken) state to the db. This is fixed in https://github.com/MTRNord/rust-synapse-compress-state/commit/427a37f4de4b8f8b345b332ad7306cc23c3ae1e3

Edit:

Initial testing seems to be working after some choking:

INFO synapse_auto_compressor::manager: Running compressor on room !zXfJBqSUvXySmsZMtB:jki.re with chunk size 1000
ERROR synapse_compress_state::compressor: Failed to get delta for state group 13719847: Missing state_goup: 13719624
INFO synapse_compress_state: Checking that state maps match...
WARN synapse_compress_state: Invalid state found: expected state to match: Missing state_goup: 13719847
WARN synapse_auto_compressor::manager: The compressor tried to increase the number of rows in !zXfJBqSUvXySmsZMtB:jki.re between Some(13719624) and 13926124. Skipping...
INFO synapse_auto_compressor::manager: Running compressor on room !ping-v6:maunium.net with chunk size 1000
ERROR synapse_compress_state::compressor: Failed to get delta for state group 13719872: Missing state_goup: 13719793
INFO synapse_compress_state: Checking that state maps match...
WARN synapse_compress_state: Invalid state found: expected state to match: Missing state_goup: 13720144
WARN synapse_auto_compressor::manager: The compressor tried to increase the number of rows in !ping-v6:maunium.net between Some(13719793) and 13722699. Skipping...
INFO synapse_auto_compressor::manager: Running compressor on room !fudRWyBUmBnUrIuyXq:matrix.org with chunk size 1000
INFO synapse_compress_state: Checking that state maps match...
INFO synapse_compress_state: New state map matches old one

It now just prints the error and not modifies the state for that room.

speatzle commented 1 year ago

I just ran into the same Issue. Where the Compressor fails on a Room that got purged in the Past:

[2023-01-30T08:29:49Z INFO  synapse_auto_compressor] synapse_auto_compressor started
[2023-01-30T08:29:49Z INFO  synapse_auto_compressor::manager] Running compressor on room !YTvKGNlinIzlkMTVRl:matrix.org with chunk size 1000
[2023-01-30T08:30:31Z ERROR panic] thread 'main' panicked at 'Missing 88328': src/lib.rs:666
FrankNagel commented 1 year ago

I'm always calling the following script in preparation of a compressor run:


#!/bin/bash

# Clean up synapse_auto_compressor state tables.
#
# The compressor doesn't take into account
# - deleted rooms
# - state groups which got deleted either as unreferenced 
#   or due to retention time
#
# Procedure:
# 1. Delete progress related to deleted rooms
# 2. Delete progress for rooms where one of the referenced state groups
#    no longer exist
# 3. Replicate changes from state_compressor_state to state_compressor_progress

set -e -u -o pipefail

export PGHOST=$1
export PGDATABASE=$2
export PGUSER=$3
export PGPASSWORD=$4

SCRIPT=$(basename $0)

echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ') INFO ${SCRIPT}]" Starting

psql <<_EOF
BEGIN;

DELETE
FROM state_compressor_state AS scs
WHERE NOT EXISTS
    (SELECT *
     FROM rooms AS r
     WHERE r.room_id = scs.room_id);

DELETE
FROM state_compressor_state AS scs
WHERE scs.room_id in
    (SELECT DISTINCT room_id
     FROM state_compressor_state AS scs2
     WHERE scs2.current_head IS NOT NULL
       AND NOT EXISTS
         (SELECT *
          FROM state_groups AS sg
          WHERE sg.id = scs2.current_head));

DELETE
FROM state_compressor_progress AS scp
WHERE NOT EXISTS
    (SELECT *
     FROM state_compressor_state AS scs
     WHERE scs.room_id = scp.room_id);

COMMIT;
_EOF

echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ') INFO ${SCRIPT}]" Finished
speatzle commented 1 year ago

@FrankNagel running this script fixes my issues

realtyem commented 1 year ago

It would be great if this could be incorporated into this program, as a nice simple way to do this recovery after a purge.

doobry-systemli commented 8 months ago

Is there any plans to tackle this issue? It seems like a major usability bug to me that the tool always crashes after a room has been deleted. We ran into the same issues and https://github.com/matrix-org/rust-synapse-compress-state/issues/78#issuecomment-1409932869 indeed fixes the issue (until next time a room got deleted).