optakt / flow-dps

Flow Data Provisioning Service
Apache License 2.0
29 stars 13 forks source link

Peformance: Tweak badger options #497

Open Ullaakut opened 2 years ago

Ullaakut commented 2 years ago

Description

// DefaultOptions returns the default Badger options preferred by the DPS for its index database.
func DefaultOptions(dir string) badger.Options {
    return badger.DefaultOptions(dir).
        WithMaxTableSize(256 << 20).
        WithValueLogFileSize(64 << 20).
        WithTableLoadingMode(options.FileIO).
        WithValueLogLoadingMode(options.FileIO).
        WithNumMemtables(1).
        WithKeepL0InMemory(false).
        WithCompactL0OnClose(false).
        WithNumLevelZeroTables(1).
        WithNumLevelZeroTablesStall(2).
        WithLoadBloomsOnOpen(false).
        WithIndexCacheSize(2000 << 20).
        WithBlockCacheSize(0).
        WithLogger(nil)
}

Try to tweak these options to ideally use almost exactly 128GB and increase performance.

Ullaakut commented 2 years ago

Options and their performance impact

These tests have been ran using a localnet dataset of a few gigabytes. A baseline indexing on my machine takes, on average, 1mn46s to complete. Some of the results of this benchmark might not be true for larger datasets, but unfortunately this is difficult for me to test since it takes days to index a complete spork.

It seems like the only option that produces noticeable positive performance improvement is having the TableLoadingMode set to its default value, options.MemoryMap. I will need to double check however, whether this is also the case with a real life data sample. Maybe it is more performant on a short run with localnet data, but would have the opposite effect with real data.

Ullaakut commented 2 years ago

Unfortunately I'm unable to test it with real data at the moment since my machine does not have enough RAM to run the live indexer, and the remote machine I have access to has no storage left.

EDIT 25/10: Will be able to test that today or tomorrow.