systemd / casync

Content-Addressable Data Synchronization Tool
1.49k stars 117 forks source link

Failure when doing large extract operation #172

Open johnp789 opened 5 years ago

johnp789 commented 5 years ago

I'm hosting a 592 MB chunk store on a remote machine using Python 3.7.1's http.server. A casync extract on a Raspberry Pi 3 fails with a broken pipe error.

$ uname -a
Linux alarmpi 4.14.78-2-ARCH #1 SMP Sun Oct 28 17:05:07 UTC 2018 armv7l GNU/Linux
$ sudo strace -ff -o /tmp/trace casync extract -v --store=http://pedro:8000/default.castr /tmp/usr.caidx /mnt/home/john/usr
[...]
Acquiring http://pedro:8000/default.castr/05f6/05f6b0f31d8f3d0f209dfadceff29655e7ccde2edeadaa3201df660ed1262180.cacnk...
Acquiring http://pedro:8000/default.castr/c83a/c83af52286b19cb26e89af2fbfa677dc360214ab0e9d2feea13f11b4a7d13461.cacnk...
Extracted bin/mke2fs
Extracting bin/mkfifo
Extracted bin/mkfifo
Extracting bin/mkfs
Extracted bin/mkfs
Extracting bin/mkfs.bfs
Extracted bin/mkfs.bfs
Extracting bin/mkfs.cramfs
Acquiring http://pedro:8000/default.castr/c571/c571bf8f795d5e769e34866e73ce399d7c026e7cc605c7534233ae60ee2d6e79.cacnk...
Extracted bin/mkfs.cramfs
Extracting bin/mkfs.ext2
Extracted bin/mkfs.ext2
Extracting bin/mkfs.ext3
Failed to acquire http://pedro:8000/default.castr/c571/c571bf8f795d5e769e34866e73ce399d7c026e7cc605c7534233ae60ee2d6e79.cacnk
Failed to run synchronizer: Broken pipe

Some of the strace output:

rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
poll(NULL, 0, 49)                       = 0 (Timeout)
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP) = 4
setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl64(4, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
connect(4, {sa_family=AF_INET6, sin6_port=htons(8000), inet_pton(AF_INET6, "fe80::4ecc:6aff:fe0a:8fc4", &sin6_addr), sin6_flowinfo=htonl(0), sin6_scope_id=if_nametoindex("eth0")}, 2
8) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=4, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=4, revents=POLLOUT|POLLWRNORM|POLLERR|POLLHUP}])
getsockopt(4, SOL_SOCKET, SO_ERROR, [111], [4]) = 0
close(4)                                = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
write(2, "Failed to acquire http://pedro:8"..., 126) = 126
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
rt_sigaction(SIGPIPE, NULL, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x76b3be00}, NULL, 8) = 0
munmap(0x76639000, 200704)              = 0
openat(AT_FDCWD, "/var/tmp/1784b44719c08e1c.carem", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW|O_NOATIME|O_CLOEXEC|O_DIRECTORY) = 4

After that fails, trying to run the same command again gets a bit farther, but then fails with a different error message.

[...]
Seeded bin/mkfs.cramfs
Seeding bin/mkfs.ext2
Seeded bin/mkfs.ext2
Seeded bin
Seeded ./
Acquiring http://pedro:8000/default.castr/b357/b357a4e26a891c027c64be4a85aad150e4fb17fa7aa04a89b62687c9a1a581d7.cacnk...
Extracting ./
Using feature flags: 32bit-uids user-names nsec-time symlinks device-nodes fifos sockets flag-hidden flag-system flag-archive flag-append flag-noatime flag-compr flag-nocow flag-dirsync flag-immutable flag-sync flag-nocomp flag-projinherit subvolume subvolume-ro xattrs acl fcaps quota-projid
Excluding files and directories with chattr(1) -d flag: yes
Excluding submounts: no
Excluding files and directories listed in .caexclude: yes
Digest algorithm: sha512-256
Extracting bin
Extracting bin/2to3
Extracted bin/2to3
Extracting bin/2to3-3.7
Extracted bin/2to3-3.7
Extracting bin/[
Extracted bin/[
Extracting bin/accessdb
Acquiring http://pedro:8000/default.castr/e7cb/e7cbec22f9dd7758b96874060e16438530da61c5dd883458c38fdab87fadb0b4.cacnk...
Acquiring http://pedro:8000/default.castr/b7d8/b7d8b7fc64fd6973e8edd14d60b93b41ff016bf4c001cfc414be14e7b523d29b.cacnk...
\Acquiring http://pedro:8000/default.castr/f6af/f6afe383133395f55d620217b07dd8a5624adaa9f730c1470703f10ba432ab6a.cacnk...
Acquiring http://pedro:8000/default.castr/4285/4285e0c353df2cb6a462d4446672529f0d6259a2164551971fabdaf2c8744320.cacnk...
Extracted bin/accessdb
Extracting bin/aclocal
Extracted bin/aclocal
[...]
Acquiring http://pedro:8000/default.castr/df65/df65131c27f7aee9bab782ec27867e456364e5fa2ce3b9b75bb12460ae1a19c9.cacnk...
Extracted bin/autom4te
Extracting bin/automake
Acquiring http://pedro:8000/default.castr/1a40/1a405edafad73d56f9d90d3a744b861a543fd391e0cfcf12aa38a39ef335519a.cacnk...
Extracted bin/automake
Extracting bin/automake-1.16
Extracted bin/automake-1.16
Failed to run synchronizer: Stale file handle

Some of that strace output:

unlinkat(9, "low-priority/23", 0)       = 0
symlinkat("high-priority/23", 9, "chunks/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45") = 0
symlinkat("ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45", 9, "high-priority/23") = 0
readlinkat(6, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45", 0x20d2788, 255) = -1 ENOENT (No such file or directory)
openat(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45.cacnk", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = -1 ENOENT (No such file or directory)
readlinkat(9, "chunks/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45", "high-priority/23", 255) = 16
fstatat64(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45", 0x7efa66a8, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
fstatat64(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45.cacnk", 0x7efa66b0, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
getrandom("\x50\xee\xe8\x6c\x3a\x00\xb9\x9b", 8, GRND_NONBLOCK) = 8
mkdirat(9, "ec7d", 0777)                = 0
openat(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45.9bb9003a6ce8ee50.tmp", O_WRONLY|O_CREAT|O_EXCL|O_NOCTTY|O_LARGEFILE|O_CLOEXEC, 0444) = 14
write(14, "(\265/\375\0P\325K\2\252\212NvU\360Lh\233\08&\264~\32\207\227\5O_2\327\350"..., 18819) = 18819
close(14)                               = 0
renameat2(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45.9bb9003a6ce8ee50.tmp", 9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45.
cacnk", RENAME_NOREPLACE) = 0
readlinkat(6, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45", 0x20d2788, 255) = -1 ENOENT (No such file or directory)
openat(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(9, "ec7d/ec7d392665e3cf17de47f4d469946da30416d9a020fee11786c15f276bb88a45.cacnk", O_RDONLY|O_NOCTTY|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = 14
read(14, "(\265/\375\0P\325K\2\252\212NvU\360Lh\233\08&\264~\32\207\227\5O_2\327\350"..., 65536) = 18819
mmap2(NULL, 1314816, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x763bf000
read(14, "", 65536)                     = 0
munmap(0x763bf000, 1314816)             = 0
close(14)                               = 0
write(12, "ned.\n\t      elsif (! defined $co"..., 32067) = 32067
write(2, "Extracted bin/automake-1.16\n", 28) = 28
openat(4, "bin/automake", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW|O_CLOEXEC) = 14
fstat64(14, {st_mode=S_IFREG|0755, st_size=257949, ...}) = 0
close(14)                               = 0
write(2, "Failed to run synchronizer: Stal"..., 46) = 46
close(3)                                = 0
close(11)                               = 0

Is there a limit I'm running up against here? If so, is there a workaround?

johnp789 commented 5 years ago

Some experiments extracting the same chunk store on an x86_64 machine with plenty of memory show that the extraction requires more memory than the Raspberry Pi 3 has. Extracting on the Pi might be running into some other problem, however, given that the error message is different.

$ sudo systemd-run -t -p MemoryLimit=500M casync --store=http://pedro:8000/default.castr extract http://pedro:8000/usr.caidx /tmp/usr_test
Running as unit: run-u371.service
Press ^] three times within 1s to disconnect TTY.
Failed to run synchronizer: Cannot allocate memory
$ sudo rm -rf usr_test/
$ sudo systemd-run -t -p MemoryLimit=1500M casync --store=http://pedro:8000/default.castr extract http://pedro:8000/usr.caidx /tmp/usr_test
Running as unit: run-u380.service
Press ^] three times within 1s to disconnect TTY.
Failed to run synchronizer: Cannot allocate memory
$ sudo rm -rf usr_test/
$ sudo systemd-run -t -p MemoryLimit=3000M casync --store=http://pedro:8000/default.castr extract http://pedro:8000/usr.caidx /tmp/usr_test
Running as unit: run-u392.service
Press ^] three times within 1s to disconnect TTY.
srd424 commented 4 years ago

Second error here might be same as #240 .. was it running on a reflink-capable filesystem?