markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
794 stars 78 forks source link

Sqlite cache size is too conservative #283

Closed bbappserver closed 1 year ago

bbappserver commented 2 years ago

Sqlite uses only a small amount of 2MB of cache by default, which actually does start making a difference on your indexes once you have a few million files. The amount of memory available to pretty much any system that would be running this tool is significantly in excess of that. Even a RaspberryPi Model 1 A has 128 MB of working memory, and its storage medium would be less than ideal for running btrfs anyway.

The default pragma cache_size should be bumped up to at least 64MiB which doesn't sound like much of a boost but means you can get through the sqlite index structures much faster to both fetch data and find insertion points.

Ideally if you make it 1/10th of whatever the hash-file size grows to be you can basically get instant access to leafs(relative to disk speed). So as an example my hash-file is about 3GiB so, giving sqlite 256MiB would be plenty.

JackSlateur commented 1 year ago

Hello,

I've pushed the change (https://github.com/markfasheh/duperemove/commit/56f4490d3c4c17ec741c6418423de5568821bbef) According to the sqlite documentation, the cache size is a hint and not always allocated

This is why I do not think implementing an adaptive cache size is required

darthShadow commented 1 year ago

Just FYI, a negative value for cache_size implies the value in kibibytes (Reference: https://www.sqlite.org/pragma.html#pragma_cache_size), so 256000000 per the commit implies 256 GiB of cache rather than 256 MiB.

JackSlateur commented 1 year ago

@darthShadow Ha yes :)