Clarification on storage behaviour and space relcimation

Licenser commented 3 years ago

Hi, we're struggling a bit with figuring out what to expect from sled storage-wise when it comes to reclaiming space from deleted Items. We've scoured the BW-Tree paper and the source code but neither leads us to a clear understanding of how exactly sled approaches the garbage collection, when it is triggered or what if any tunables exist.

There are few bugs in the tracker mentioning the GC failing to trigger but it's a bit hard to say if we're running into any of that without understanding the conditions of when we should expect it to happen.

Would it be possible to add a section to the documentation about the GC, it's conditions, it's guarantees, tunables, and expected behaviour?

mfelsche commented 3 years ago

We did some small experiment with sled covering how we use it.

The code snippet we used is in this gist: https://gist.github.com/mfelsche/5ff37bb316254a7d0054474384b83c59

In the experiment we create a unique id using db.generate_id() and then use this as key for a random 10kb value. We exaggerated the payload size a little to see the effect quickly. On every iteration of the loop we insert a new document and remove the previous one. Our use case is a Write-Ahead-Log so we expect to only have very few records be available at a time inside our db. Every 1000 records we check the size on disk and print if the size did shrink somehow.

We know that the storage layer of sled is append-only. What we wanted to find out is if there is a "GC" mechanism which kicks in, overwrites old pages and truncates the db file.

Running the gist above showed that when running it on linux or OSX that the db file was only growing. The db.generate_id() mechanism will consume some space on disk as well as all the inserts and removes. The db file was growing to more than 11GB quickly.

When opening the db with sled::Config::default().temporary(true).open().unwrap() on OSX and linux disk size does shrink from time to time and seems to stabilize at around 5MB.

Is a disk GC mechanism missing or can we trigger it ourselves to control disk size (e.g. regular snapshotting)? Or would we have to contribute such a mechanism?

Licenser commented 3 years ago

@spacejam any chance to add any about that to the docs? happy to do the write-up on it if we get some idea of how space reclamation works.

jeromegn commented 2 years ago

I ran this script and observed the same results.

However, I figured out why. The test is inserting constantly, using all CPU resources and sled doesn't "have the time" to GC. I added a 1s sleep and some println! in the loop every 1000 items and then observed the size growing, shrinking and staying the same at various points in time, never exceeded 200MB (in release mode) and often shrinking down to ~50MB.

I assume given more time to GC, under normal operation, it might've shrunk more.

It's worth noting I'm running main and not a stable release.

spacejam / sled

Clarification on storage behaviour and space relcimation #1333