spacejam / sled

the champagne of beta embedded databases
Apache License 2.0
8.17k stars 384 forks source link

Re-keying a database ended up using considerably more disk storage (~50%) #1061

Open crusty-dave opened 4 years ago

crusty-dave commented 4 years ago

expected result 2.26 GB actual result 3.46 GB sled version 0.31.0 rustc version 1.40.0 operating system Windows 10

Started with the following statistics - key was coming from one field in JSON data.

    db_identity      .\cfg\db_identity
        disk usage   **2.26 GB**
        domains      9
        groups       138835
        users        768227
        devices      768055

After re-keying the database, the disk storage increased by a considerable amount, despite they key size being reduced:

    db_identity      .\cfg\db_identity
        disk usage   **3.46 GB**
        domains      6
        groups       138641
        users        768135
        devices      768040

Note that some data had been duplicated due to mixed keys, the re-keying removed those entries.

The re-keying was done in a loop using the following algorithm:

            let mut batch = sled::Batch::default();
            for iter in t.tree.iter() {
                let kv = match iter {
                    Ok(kv) => { kv }
                    Err(_e) => {
                        errors += 1;
                        continue;
                    }
                };
                if kv.1.len() == 0 {
                    errors += 1;
                    continue;
                }
                let v_string = String::from_utf8_lossy(kv.1.as_bytes());
                let val = match json::parse(&v_string) {
                    Ok(val) => { val }
                    Err(_e) => {
                        errors += 1;
                        //  process next entry in for loop
                        continue;
                    }
                };
                let key = self.create_key(&val, false).into_bytes();
                re_keyed += 1;
                //  remove using the old key
                batch.remove(kv.0);
                //  insert with the new key
                batch.insert(key, kv.1);
            } // end for

            match t.tree.apply_batch(batch) {
                Ok(()) => {}
                Err(e) => {
                    errors += re_keyed;
                    status = StatusCode::INTERNAL_SERVER_ERROR;
                    let detail = format! {"{} failed to re-key {} entries", fn_name, re_keyed};
                    result = object! {
                        JSON_TAG_STATUS => format!{"{}", status},
                        JSON_TAG_RE_KEYED => 0,
                        JSON_TAG_ERRORS => errors,
                        JSON_TAG_ERROR => format!{"{:?}", e},
                        JSON_TAG_DETAIL => detail,
                    };
                }
            }

Note that no errors were detected.

I realize that performance and storage is ongoing development, but I thought you might be interested in this data-point. I don't currently see any tools to reclaim lost space.

Perhaps using a batch transaction was the wrong approach for this?

divergentdave commented 4 years ago

Using batches does result in higher storage overhead until later disk GC happens. From a recent conversation on the Discord:

Batches in sled are very slightly less efficient because they must communicate additional atomic recovery metadata, but this is just 15 extra bytes in the log. However, due to the batch being atomic, the on-disk segments that are being written to during a batch may not be garbage collected until the batch completes. For huge batches that may explain some extra space usage

sled aggressively batches writes anyway so you don't really gain any perf by using them. They exist purely to communicate atomicity in the presence of crashes