Open Stebalien opened 7 years ago
GC also needs to be called on badger instance, we might want to expose this too.
I can do batching as part of https://github.com/ipfs/go-ipfs/pull/4149/ , go-ds-flatfs doesn't have real delete baching (it just was queuing them up and doing them all at the end) so that is why it was never used there.
@Kubuxu Sounds like a good idea (although it can be a separate PR if it adds too much code, large PRs are a pain to review and/or rebase).
It shouldn't be but making it a separate PR is good idea either way.
@Stebalien
GC is 4x slower with the badger datastore as it actually has to write data, not just delete files.
Yes, and even more expensive than the rewrite operations during GC are it's searches of every key in the value log file being checked to decide if they exceed the threshold to trigger the rewrite. Do you have a test that would point to that 4x performance impact?
We may need to better batch/parallelize deletes (probably want a DeleteBlocks method).
I'm not understanding how the parallelization would help, is GC called after every block deletion?
So, trying to reduce my own noise: indeed GC is slower with syncWrites
enabled (4x sound about right), Badger's creator suggested turning it off during GC (not sure if that is possible) and also to parallelize deletes as mentioned here (or alternatively running bs.DeleteBlock(k)
concurrently in multiple goroutines).
I can confirm that (from simple tests) GC with syncWrites
disabled has pretty much the same performance as flatfs
and also that the actual Badger's GC (triggered when there are more than one value log file, i.e., more than 1GB of data in the repo) has a running time that is not much more than flatfs
(1.25-1.5x), more tests are needed.
GC is 4x slower with the badger datastore as it actually has to write data, not just delete files. We may need to better batch/parallelize deletes (probably want a
DeleteBlocks
method).