Open winstonewert opened 4 years ago
This also causes an OOM kill (before disk is exhausted) if let running long enough.
extern crate bincode;
fn main() -> sled::Result<()> {
let db = sled::open("db")?;
let t1 = db.open_tree("t1")?;
let t2 = db.open_tree("t2")?;
let mut bincode = bincode::config();
bincode.big_endian();
// This approximates somewhat my workload...
let infinite_data = (0u64..).flat_map(|i| (0u64..100).map(move |j| (i, j)));
for (i, j) in infinite_data {
let key1 = bincode.serialize(&(i, j)).expect("can serialize");
let key2 = bincode.serialize(&(j, i)).expect("can serialize");
t1.insert(key1, vec![])?;
t2.insert(key2, vec![])?;
}
Ok(())
}
Using sled 0.31 (and also latest master) on Ubuntu 18 and Rust 1.45 (nightly 2020-05-12). C'mon, man! I can try to live with sled using tons of space, but this is a deal breaker in my case (and basically any big data usecase). Hope it's easy to fix, though.
Yo, found a mitigation which might also help solve the problem... I found this comment here:
cache_capacity is currently a bit messed up as it uses the on-disk size of things instead of the larger in-memory representation. So, 1gb is not actually the default memory usage, it's the amount of disk space that items loaded in memory will take, which will result in a lot more space in memory being used, at least for smaller keys and values. So, play around with setting it to a much smaller value.
https://github.com/spacejam/sled/issues/986#issuecomment-592950100
Which led me to fiddle with the cache_capacity
knob making it use 100_000 instead of the standard 1_000_000_000. What I have found (qualitatively):
As you'll see in my code, I set the cache capacity as low as I could without resolving my problem. So I wonder if we are hitting different issues.
Good question... I was reluctant in opening a new issue, though. For context, my issue is to load a big dataset. So it's a write problem, not a read problem. @spacejam has been in contact and told me it's a knwon issue, it seems.
I'm currently looking into this approach for handling this issue: https://github.com/spacejam/sled/issues/1093
@spacejam That ticket references many inserts, which wasn't my issue. My issue is simply traversing a large database. It could be related issues for all I know, but wanted to make sure.
sled 0.31.0 rustc 1.42.0 (b8cedc004 2020-03-09) Ubuntu 19.10 Code:
Expected outcome: Uses a small amount of memory Actual outcome: Uses several gigabytes of memory and gets killed.
The database is 4.7gb
Some debug logs: https://pastebin.com/e49teW5m