jmacd / xdelta

open-source binary diff, delta/differential compression tools, VCDIFF/RFC 3284 delta compression
http://xdelta.org
1.1k stars 184 forks source link

Single-threaded #187

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
>What steps will reproduce the problem?
1. Use xdelta3
2. Watch all cores idle, except one

>What is the expected output? What do you see instead?
This program seems like a natural candidate for multithreading

>What version of the product are you using? On what operating system?
3.0 on debian

>Please provide any additional information below.
Pretty please?

Original issue reported on code.google.com by matt...@kregert.se on 28 Jul 2014 at 1:36

GoogleCodeExporter commented 9 years ago
Yes, it is difficult to say no to this request.
I think a few other issues are more important to voters: 3.x reading 1.x 
patches is probably #1, 64-bit source-window (i.e., -B above 2GB) maybe #2.

Original comment by josh.mac...@gmail.com on 15 Oct 2014 at 5:57

Magmatrix commented 9 years ago

Just to show the impact of multithreading...

I made a script which slices up the source+input files and runs xdelta in parallell, like this:

for part in $(seq 1 $parts); do
  xdelta3 -e -s <(dd if=srcFile bs=$(((srcSize+parts-1)/parts)) skip=$part count=1) \
  <(dd if=inFile bs=$(((inSize+parts-1)/parts)) skip=$part count=1) vcdiff.$part &
done
wait

I have set the minimum part size to 2GB minus 4KB. This is because of some limitation in dd (probably an os setting).

Source file is 8.4GB, input file is 8.4GB. With 2 cores the time goes down to 1/2, 3 cores 1/3, 4 cores 1/4, ... and on top of that i get a little extra if i double the number of parts (b/c of HT).

On a E3-1241v3 (4 physical cores, 8 logical), i get these numbers:

STATS: Create delta: standard, 917313470 bytes, 162 seconds <== non-parallell, ordinary xdelta3 STATS: Create delta: 5 parts, 908647719 bytes, 49 seconds STATS: Create delta: 6 parts, 908798187 bytes, 44 seconds STATS: Create delta: 7 parts, 909067432 bytes, 44 seconds STATS: Create delta: 8 parts, 908983811 bytes, 40 seconds <== Speed limit here, at 8 threads STATS: Create delta: 9 parts, 909027821 bytes, 42 seconds STATS: Create delta: 10 parts, 909073885 bytes, 39 seconds STATS: Create delta: 11 parts, 909149269 bytes, 41 seconds STATS: Create delta: 12 parts, 909029373 bytes, 38 seconds STATS: Create delta: 13 parts, 909060791 bytes, 38 seconds STATS: Create delta: 14 parts, 909271720 bytes, 40 seconds STATS: Create delta: 15 parts, 909311852 bytes, 39 seconds STATS: Create delta: 16 parts, 908997499 bytes, 39 seconds STATS: Create delta: 17 parts, 909169705 bytes, 42 seconds STATS: Create delta: 18 parts, 909110027 bytes, 41 seconds STATS: Create delta: 19 parts, 909186985 bytes, 40 seconds STATS: Create delta: 20 parts, 909187773 bytes, 41 seconds STATS: Create delta: 21 parts, 909201742 bytes, 39 seconds STATS: Create delta: 22 parts, 909365994 bytes, 40 seconds STATS: Create delta: 23 parts, 909393406 bytes, 41 seconds STATS: Create delta: 24 parts, 909380756 bytes, 41 seconds STATS: Create delta: 25 parts, 909092782 bytes, 41 seconds STATS: Create delta: 26 parts, 909811869 bytes, 40 seconds STATS: Create delta: 27 parts, 909175496 bytes, 40 seconds STATS: Create delta: 28 parts, 909849172 bytes, 40 seconds STATS: Create delta: 29 parts, 908955653 bytes, 41 seconds STATS: Create delta: 30 parts, 909365273 bytes, 41 seconds

As you can see, the number of logical processors sets the limit but adding a lot more threads doesn't create any problems.

Also; note that the delta size is a bit smaller when running in parallell. This might just be a coincidence, or maybe it is easier for xdelta to work with smaller files.

If only xdelta could do this automatically, so i wouldn't have to handle multiple delta files...

Magmatrix commented 4 years ago

Hmm, has this project been abandoned? 😕

JustMyGithub commented 2 years ago

Keep in mind that this only holds for SSDs. HDDs with parallel access cause seek times and will not scale as good.

Magmatrix commented 2 years ago

Keep in mind that this only holds for SSDs. HDDs with parallel access cause seek times and will not scale as good.

No, it does not only hold for SSD:s. I was running my tests on a 30 TB raid-5 array (softraid) with very slow HDDs. Even that slow storage had enough bandwidth to feed 4 instances of xdelta.

JustMyGithub commented 2 years ago

No, it does not only hold for SSD:s. I was running my tests on a 30 TB raid-5 array (softraid) with very slow HDDs. Even that slow storage had enough bandwidth to feed 4 instances of xdelta.

Up to some point it may hold for HDDs and especially RAIDs as well, but "adding a lot more threads doesn't create any problems" is not true for HDDs, at some point it will create problems. The specific point where it will happen depends on how fast the CPU can process the data. That was what I intended to say.