Closed jbruchon closed 8 years ago
Added a --fileblocksize option to make it easier to tune and test various sizes.
The optimum size may likely vary based on hardware and file set. Did some testing with my primary test set of files and the sweet spot there appears to be 128K.
Changed default to 128K based on above runs. The value is now configurable so it'll be easy to experiment more.
Here's an easy performance boost: in
filecompare.c
increaseFC_BLOCK_SIZE
from 8192 to a much larger number like 1048576 (1 MiB). I have been heavily benchmarkingjdupes
vs.dupd
and on a 900,000+-file data set with an average file size of ~527 KB, the increased read block size injdupes
makes a huge difference due to greatly reduced disk thrashing.jdupes
processed those 900,000+ files in ~7800 seconds,dupd
processes them in ~9800 seconds which is what ultimately led me to the read block size difference.The obvious downsides of this change would be increased memory usage and possible excess data reads during file comparisons if the smaller block size would allow for earlier comparison termination. Still, it's incredibly low-hanging fruit for significantly increased performance. It would also significantly reduce calls to
read()
andmemcmp()
and the associated call overhead due to performing far less read loop iterations (128x less iterations per MiB of file data).