Re-keying a database ended up using considerably more disk storage (~50%)

expected result 2.26 GB actual result 3.46 GB sled version 0.31.0 rustc version 1.40.0 operating system Windows 10

Started with the following statistics - key was coming from one field in JSON data.

    db_identity      .\cfg\db_identity
        disk usage   **2.26 GB**
        domains      9
        groups       138835
        users        768227
        devices      768055

After re-keying the database, the disk storage increased by a considerable amount, despite they key size being reduced:

    db_identity      .\cfg\db_identity
        disk usage   **3.46 GB**
        domains      6
        groups       138641
        users        768135
        devices      768040

Note that some data had been duplicated due to mixed keys, the re-keying removed those entries.

The re-keying was done in a loop using the following algorithm:

            let mut batch = sled::Batch::default();
            for iter in t.tree.iter() {
                let kv = match iter {
                    Ok(kv) => { kv }
                    Err(_e) => {
                        errors += 1;
                        continue;
                    }
                };
                if kv.1.len() == 0 {
                    errors += 1;
                    continue;
                }
                let v_string = String::from_utf8_lossy(kv.1.as_bytes());
                let val = match json::parse(&v_string) {
                    Ok(val) => { val }
                    Err(_e) => {
                        errors += 1;
                        //  process next entry in for loop
                        continue;
                    }
                };
                let key = self.create_key(&val, false).into_bytes();
                re_keyed += 1;
                //  remove using the old key
                batch.remove(kv.0);
                //  insert with the new key
                batch.insert(key, kv.1);
            } // end for

            match t.tree.apply_batch(batch) {
                Ok(()) => {}
                Err(e) => {
                    errors += re_keyed;
                    status = StatusCode::INTERNAL_SERVER_ERROR;
                    let detail = format! {"{} failed to re-key {} entries", fn_name, re_keyed};
                    result = object! {
                        JSON_TAG_STATUS => format!{"{}", status},
                        JSON_TAG_RE_KEYED => 0,
                        JSON_TAG_ERRORS => errors,
                        JSON_TAG_ERROR => format!{"{:?}", e},
                        JSON_TAG_DETAIL => detail,
                    };
                }
            }

Note that no errors were detected.

I realize that performance and storage is ongoing development, but I thought you might be interested in this data-point. I don't currently see any tools to reclaim lost space.

Perhaps using a batch transaction was the wrong approach for this?

spacejam / sled

Re-keying a database ended up using considerably more disk storage (~50%) #1061