Open Ullaakut opened 2 years ago
These tests have been ran using a localnet dataset of a few gigabytes. A baseline indexing on my machine takes, on average, 1mn46s to complete. Some of the results of this benchmark might not be true for larger datasets, but unfortunately this is difficult for me to test since it takes days to index a complete spork.
WithTableLoadingMode(options.MemoryMap)
/WithValueLogLoadingMode(options.MemoryMap)
: 136% performance improvementWithSyncWrites(false)
: Negligible/no impact.WithMaxTableSize(2000 << 20)
: Negligible/no impact.WithValueLogFileSize(2000 << 20)
: Negligible/no impact.WithDetectConflicts(false)
: Negligible/no impact.WithBlockSize(10MB)
: Negligible/no impact.WithBloomFalsePositive(0)
: Negligible/no impact.WithNumCompactors(16)
: Negligible/no impact.WithMaxLevels(10)
: Performance decrease with more levels, no noticeable improvement with less levels.WithNumMemtables(256)
: Slight negligible performance decrease.WithKeepL0InMemory(true)
: Negligible/no impact.WithBypassLockGuard(true)
: Negligible/no impact.It seems like the only option that produces noticeable positive performance improvement is having the TableLoadingMode
set to its default value, options.MemoryMap
. I will need to double check however, whether this is also the case with a real life data sample. Maybe it is more performant on a short run with localnet data, but would have the opposite effect with real data.
Unfortunately I'm unable to test it with real data at the moment since my machine does not have enough RAM to run the live indexer, and the remote machine I have access to has no storage left.
EDIT 25/10: Will be able to test that today or tomorrow.
Description
Try to tweak these options to ideally use almost exactly 128GB and increase performance.