ipfs / go-ds-pebble

A datastore implementation backed by https://github.com/cockroachdb/pebble (experimental)
Other
9 stars 9 forks source link

pebble+kubo setup instructions needed #29

Open pyromaniac3010 opened 1 year ago

pyromaniac3010 commented 1 year ago

Hello,

we are about to setup a multi terabyte size ipfs server with kubo and would love to directly try out the pebble datastore. I know it is still experimental, but running kubo with several terabytes of data on flatfs is a big pain as garbage collection is not working at all. Unfortunately I cannot find any documentation on how to set this up. It would be great if you could share some setup instructions on how to use go-ds-pebble with kubo.

Thanks a lot in advance!

Dreamacro commented 9 months ago

kubo doesn't provide pebble store. I have a custom ipfs implementation depending on boxo. I chose pebble as datastore, and flatfs as blockstore (which stores the real file block).

After changing leveldb store to pebble store, the speed of GC has increased by at least dozens of times. So it's not flatfs that's to blame for slow GC, it's leveldb. I also tried leveldb and pebble as blockstore, but the CPU and memory usage is unacceptable.

lidel commented 9 months ago

FWIW I've filled https://github.com/ipfs/kubo/issues/10347 to track bunding the pebbleds plugin with kubo, just like we do with legacy badgerds. Documentation + profiles are part of that.

@Dreamacro mind sharing more information about your experience with pebble? did you end up using it only for pins, or do you also use it for blocks (flatfs replacement)? what is the size of your repo (in GiB, and number of pins/blocks)?

Dreamacro commented 9 months ago

@lidel Sure, we use two modes in different situations:

  1. On Windows, we use filestore only to store raw files, all machines store around 50GiB~300GiB. We need often to add or delete files so we need to do GC frequently. Since filestore only stores file references to /blocks/, so finally we choose pebble store only. We tested a variety of stores (including leveldb badger3/4 and pebble), of which pebble is amazingly good on GC.
  2. On Linux, we use old-school pebble and flatfs for /blocks/, all nodes store less than 1TB, and we need to clean and do GC when the disk is almost full. pebble wins the test again.

did you end up using it only for pins, or do you also use it for blocks (flatfs replacement)?

When not storing raw file blocks at all, such as filestore, pebble can be used exclusively instead of flatfs. Otherwise, just use pebble+flatfs instead of the current leveldb+flatfs.