Open poettering opened 5 years ago
On my laptop:
$ rm -rf /var/tmp/default.castr /var/tmp/ffox.* ; casync --digest=default --compression=gzip --without=privileged make /var/tmp/ffox.caidx ~/Downloads/firefox/
(before)
4fe6372c31710ac3f3c2af54f56a896c5b30dfab97988abf2aedd3b145e378ad
29.95s user 1.82s system 96% cpu 32.915 total
29.85s user 1.89s system 97% cpu 32.691 total
29.87s user 1.80s system 98% cpu 32.087 total
(after)
36.57s user 1.52s system 232% cpu 16.405 total
36.60s user 1.48s system 233% cpu 16.324 total
36.90s user 1.52s system 211% cpu 18.167 total
So there's both a slow-down in total time, and growth in CPU usage.
On rpi3, I see a detect speedup, 19s→11–14s on /usr/lib/modules/4.19.2-300.fc29.aarch64/kernel/drivers/, and similar speedup on f30-test.fedorainfracloud.org (2 vCPU cloud instance). I need to also test on a "beefy" machine, but I don't have one at hand right now.
Fails with a corrupt stack here:
casync: ../src/castore.c:274: worker_thread: Assertion `store->worker_thread_socket[1] >= 0' failed.
Core was generated by `build/casync make /tmp/archive.caidx /usr/lib64/kde3'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007efcc399353f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7efcc37df700 (LWP 32593))]
(gdb) bt
#0 0x00007efcc399353f in raise () from /lib64/libc.so.6
#1 0x00007efcc397d895 in abort () from /lib64/libc.so.6
#2 0x00007efcc397d769 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3 0x00007efcc398b9f6 in __assert_fail () from /lib64/libc.so.6
#4 0x00007ffd3e60dbbe in ?? ()
#5 0x00007ffd3e60dbbf in ?? ()
#6 0x000000000041f94b in worker_thread (p=<error reading variable: Cannot access memory at address 0xffffffffffffff98>) at ../src/castore.c:274
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(This is 100% repeatable.)
On a machine with 12 cores:
$ rm -rf /tmp/default.castr /tmp/archive.* && time build/casync make /tmp/archive.caidx /usr/lib64/firefox
ccf3d08f975b7be1fc274d798e81293ee3e12deb1922e12b59118beee46cac28
10.39s user 0.32s system 99% cpu 10.763 total
10.48s user 0.28s system 99% cpu 10.822 total
10.58s user 0.31s system 99% cpu 10.957 total
(after)
10.82s user 0.45s system 123% cpu 9.113 total
10.77s user 0.49s system 123% cpu 9.113 total
11.21s user 0.49s system 128% cpu 9.127 total
Again, a moderate speedup.
I guess we could merge this if the crash is figured out.
This gives a 10% speed improvement. (Not more unfortunately, as this just parallelizes the zstd work, but it's the sha512-256 logic that costs the most CPU time, and parallelizing that is much harder)