matrix-org / rust-synapse-compress-state

A tool to compress some state in a Synapse instance's database
https://pypi.org/project/synapse-auto-compressor/
Apache License 2.0
143 stars 33 forks source link

Panic at 'attempt to subtract with overflow' when using the -m option when compression increases row count #31

Closed TeknikalDomain closed 3 years ago

TeknikalDomain commented 3 years ago

(I really didn't know a concise way to explain that one.)

Sometimes, the compression algorithm may, for lack of a better term, 'fail,' and produce an output row set that is larger than the input (#7). If this happens when the -m flag is used to suppress output if savings are not large enough, then it will panic. Here's an example. I doubt that it matters since it seems that every case like this causes a panic, but this run was done with
-m 200:

Fetching state from DB for room '!UBhRLVEYYvUWnsljRs:matrix.org'...
  [1s] 3951 rows retrieved
Got initial state from database. Checking for any missing state groups...
No missing state groups
Number of state groups: 610
Number of rows in current table: 3945
Compressing state...
[00:00:00] ████████████████████ 610/610 state groups
Number of rows after compression: 7598 (192.60%)
Compression Statistics:
  Number of forced resets due to lacking prev: 18
  Number of compressed rows caused by the above: 3701
  Number of state groups changed: 69
thread 'main' panicked at 'attempt to subtract with overflow', src/main.rs:232:22
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
jo-so commented 3 years ago

@erikjohnston How about this change?

diff --git i/src/main.rs w/src/main.rs
index 9f276da..71f85e1 100644
--- i/src/main.rs
+++ w/src/main.rs
@@ -229,7 +229,7 @@ fn main() {
     );

     if let Some(min) = min_saved_rows {
-        let saving = (original_summed_size - compressed_summed_size) as i32;
+        let saving = original_summed_size.saturating_sub(compressed_summed_size);
         if saving < min {
             println!(
                 "Only {} rows would be saved by this compression. Skipping output.",
erikjohnston commented 3 years ago

LGTM!

TeknikalDomain commented 3 years ago

Just out of curiosity, the numbers nerd in me wonders if it's worth checking this case to output a specific error message?

This compression would actually increase the row count. Skipping output.

Probably as simple to implement as checking if compressed_summed_size > original_summed_size, though it would add an additional branch to the code path (that skips the conditional check that causes this in the first place...), if anyone cares.

jo-so commented 3 years ago

Just out of curiosity, the numbers nerd in me wonders if it's worth checking this case to output a specific error message?

You can get this information from the line: “Number of rows after compression: 7598 (192.60%)” And you can find there the amount of increase.