Closed haasn closed 7 years ago
Note: I don't need to involve a second zpool at all; I get the same error when transferring to the same pool.
I also found out that not all of my datasets are affected, I just happened to get unlucky when I tested a few at random. I'm still not sure what distinguishes the ones that are from the ones that aren't affected, but I have a feeling it may perhaps be related to xattrs, ACLs, paxmarks, SELinux contexts or something of that nature, because it only seems to affect sufficiently nontrivial datasets. (I have feature@large_dnode
enabled, acltype=posixacl
and xattr=sa
)
That said, if I try reproducing it by copying over /usr/bin/sudo
and /bin/ping
into the ‘test’ dataset and paxmarking them, and it still works fine. (And I no longer have a SELinux kernel or related userland installed)
I've also tried dumping 1G of /dev/urandom
into the ‘test’ dataset, and it still transmits fine. I'm really not sure what else I should be trying to try and reproduce it on a clean dataset. I can't find a pattern among the ones that are affected - they're both small and large and contain all sorts of different files, yet some seem to get sent correctly where others fail.
Edit: Turns out that actually, some of the datasets which failed to get received on a different pool actually work fine on the same pool - although some fail even on the same pool. It also seems like there's a difference between -R
and non--R
here, some fail with -R
but works without it, which is contrast to the behavior I saw in my first post where it failed even without -R
- although that may have been because I was testing on one of the datasets that always seems to fail in my tests (zarathustra/ROOT/gentoo
).
I'm suffering from a combinatorial explosion of variables and different cases here, and I'm struggling to even document what I'm observing. I'll continue trying to reproduce the issue on a clean dataset when I get the time/energy.
Problem seems to have solved itself with an update.
Hmm, it still fails when trying to do it across zpools, exact same error. I can copy a dataset from one place on my zpool to another, but doing it from my main pool to my backup pool still fails.
That said, it again seems to work for some datasets but not others.
I have two seemingly identical (as per ‘zfs get all’) datasets in the same location, differing only by name, content and mountpoint - sending one works, and sending the other does not.
(The one that fails is the larger of the two)
@kpande I don't see how that's relevant:
That only invalidates 3., not 1. or 2.
I tested it and zfs hold
/ zfs release
made no difference. Also, I can reproduce each of these failures consistently - i.e. if a snapshot fails to transfer, it will always fail. It's just whether or not any given snapshot fails or not that seems to be arbitrary.
I'm currently in the process of running a script that tries sending every snapshot one by one (i.e. not recursively), so I can get a list of all that work and all that fail and compare them ony-be-one. I'll report back when it finishes.
I'll report back when it finishes.
It finished. Out of my 48 datasets, only 20 of them were zfs receive
-able. The rest failed with the same error as described multiple times.
My invocation for this test was something like:
for x in DATASETS; do
zfs send -Le ${x}@test-2016-09-28 | zfs receive -Fvu backup/${x//\//-}
done
(The shell replacement was to make sure datasets ended up in a single flat level instead of using a hierarachy, because the parent datasets fail in some cases)
I see no discernable pattern among the datasets that survived the zfs receive
, except that they tend to be very small datasets. Out of 1.7T data, only 103G survived. The largest dataset that succeeded was 759M, but the majority were well under 500MB. The smallest dataset that failed was 391M, but the majority were well over 1G.
The number of files in the dataset seems to be uncorrelated, as is the type of file (i.e. SELinux tags, ACLs, xattrs, etc. seem to have no effect). Really, the only pattern I see is that large datasets seem to fail.
If I had to guess, I would assume this has something to do with the large_dnode
feature, because that is the only feature I am using which has not made it into any stable release. (Incidentally: The enabled feature lists of both pools are identical, as are the properties on the root dataset)
Is there some easy way I can test to see if one of the affected datasets would transfer fine it it was not using large dnodes? I can set the dnodesize
to something else but that won't affect already-written data..
Is there some easy way I can test to see if one of the affected datasets would transfer fine it it was not using large dnodes? I can set the dnodesize to something else but that won't affect already-written data..
Okay, I did a test like this:
@test
snapshotIt failed.
I then tried repeating this but setting ‘dnodesize=legacy’ when creating the temporary dataset. It still failed, so dnodesize=legacy does not improve the situation.
I'll try bisecting the dataset to see if it's a particular file that's causing the failure.
Update: I could not bisect it to any individual file, but I did reach a point where if I start deleting stuff, it will suddenly succeed. (For example, there's one file which, if deleted, will immediately make it succeed again. As soon as I cp
over this exact file, it will fail to work)
Here is a find .
of my directory structure: https://0x0.st/QwC.txt
Moving ./a/b/c/d
to a single top-level location (e.g. mv ./a/b/c/d ./foomakes it work. As does deleting most/some of the subfolders (0,1,2,3 ...) inside ./a/b/c/d. To provide an example of a file deleting,
./a/b/c/d/D/FA/3366Ad01` makes it work. (This file is 3MB large and contains a regular .gif image)
Update 2: I've trimmed it down to the point where if I delete the smallest directory inside ./a/b/c/d
, it will fail at first and then succeed a few seconds later. (And continue succeeding). If I rsync over the directory again, it will fail again.
Could it be the case that the problem here is that there's too much data nested inside a single directory structure, or something like that? (Since mv
ing the directories around makes it work again..). That wouldn't explain why my 759MB dataset succeeded, though, although that dataset has its size split up over multiple top-level directories (and the majority of those 700MB are a single iso image)
Update 3: I could move the ‘d’ folder directly into ‘./a/d’ and still reproduce the issue. I could also remove (now empty) the ‘./a/b/c’ folder. At this point, whether it succeeds or not seems to be somewhat change. Little things like mkdir
and rmdir
seem to flip it, and it will sometimes revert back to succeeding after waiting a few seconds. (Also, creating/destroying the snapshot itself during the testing process may contribute to this)
It feels like I've reached some critical point where the ‘size’ is almost at the threshold where it will trigger whatever behavior breaks the zfs receive
. Here are my current stats for what it's worth, and it is currently failing at the time of measuring:
zfs get all
: https://0x0.st/QxH.txt
find .
: https://0x0.st/QxX.txt
zpool get all
: https://0x0.st/Qx8.txt https://0x0.st/QxK.txt (respectively)
And ny test command was this:
zfs snapshot zarathustra/MISC/test@test; zfs send -Lev zarathustra/MISC/test@test | zfs receive -Fvu backup/test-bisect && echo SUCCESS && zfs destroy -r backup/test-bisect; zfs destroy zarathustra/MISC/test@test
After I delete an empty dir to make it succeed again, my zfs get all
reports no change.
I tried re-running the previous test with -o xattr=off -o dnsize=legacy -o compression=off -o acltype=off
for some extra sanity. (It still failed)
Deleting some smallish (~1MB) directory still caused it to succeed again, even though the total size on-disk was now several MB larger than before (69 MB instead of 64 MB) because compression was turned off. So at any rate, that doesn't seem to have an effect, whatever it's worth.
I'm at a loss for what else to debug here, or how to get more information about the zfs receive process and what exactly it's doing that fails. Also, just for sanity, if I record a zfs send
stream into a file and zfs receive
that, my results will always be identical. (In other words, a single send stream will either always work or always fail)
Interesting update: It still happens if I NULL out all of the files (while preserving the filesizes). In other words, I can send you a “sanitized” zfs send
stream, and I can also compress it.
I've uploaded the compressed zfs send stream here: https://0x0.st/Q36.xz (decompresses to 80 MB)
Trying to zfs receive
this stream fails on my end:
λ zfs receive backup/test-bisect < /mem/package-fail
cannot open 'backup/test-bisect': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
It would be great if somebody else could try and reproduce!
In case it helps debugging, here is a second zfs send
stream, this time one that works (I deleted a few directories): https://0x0.st/QYh.xz
@haasn I just tried to receive both send streams on a debian box running the current master on a newly created pool and neither of those seems to work for me; can you double-check if your system can actually receive the stream from the second link?
@loli10K It seems to work.. sometimes?
λ wget https://0x0.st/QYh.xz
λ unxz QYh.xz
λ zfs receive -vu backup/test < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test@test
cannot open 'backup/test': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
λ zfs receive -Fvu backup/test < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test@test
received 69.3MB stream in 1 seconds (69.3MB/sec)
λ zfs receive -Fvu backup/test2 < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test2@test
received 69.3MB stream in 1 seconds (69.3MB/sec)
λ zfs receive -vu backup/test2 < QYh
cannot receive new filesystem stream: destination 'backup/test2' exists
must specify -F to overwrite it
λ zfs receive -vu backup/test3 < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test3@test
received 69.3MB stream in 1 seconds (69.3MB/sec)
λ zfs destroy -rv backup/test
will destroy backup/test@test
will destroy backup/test
λ zfs destroy -rv backup/test2
will destroy backup/test2@test
will destroy backup/test2
λ zfs destroy -rv backup/test3
will destroy backup/test3@test
will destroy backup/test3
λ zfs receive -vu backup/test < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test@test
received 69.3MB stream in 1 seconds (69.3MB/sec)
Also,
backup feature@async_destroy enabled local
backup feature@empty_bpobj active local
backup feature@lz4_compress active local
backup feature@spacemap_histogram active local
backup feature@enabled_txg active local
backup feature@hole_birth disabled local
backup feature@extensible_dataset active local
backup feature@embedded_data active local
backup feature@bookmarks disabled local
backup feature@filesystem_limits enabled local
backup feature@large_blocks enabled local
backup feature@large_dnode active local
@haasn sometimes sounds about right. After a little more testing it seems this issue is more time related than size related (incidentally, if you increase the size of the sent zfs stream it will increase receive time too).
root@debian-zfs:~# zfs send -Le tank/send@issue-5027 > file
root@debian-zfs:~# echo 2 > /sys/module/zfs/parameters/zfs_txg_timeout
root@debian-zfs:~# while true; sleep 2 ; do cat file | zfs receive -Fuv tank/recv ; zfs destroy tank/recv -r ; done
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
^C
root@debian-zfs:~# while true; sleep 1 ; do cat file | zfs receive -Fuv tank/recv ; zfs destroy tank/recv -r ; done
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'tank/recv': dataset does not exist
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'tank/recv': dataset does not exist
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'tank/recv': dataset does not exist
^C
root@debian-zfs:~# while true; sleep 3 ; do cat file | zfs receive -Fuv tank/recv ; zfs destroy tank/recv -r ; done
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
^C
Can you do more testing with different zfs_txg_timeout
/sleep
values on your system?
EDIT: probably it's both size and time related, meaning that the issue seems to manifest when we can't receive the zfs stream in a single txg (it either gets switched by a timeout or too much dirty data i guess?). For instance the same exact snapshot dumped to a file can be consistently received as long as is small enough, but can still be forced in an "cannot receive new filesystem stream: dataset does not exist" condition if we add a limit the bandwidth at which we receive the data with mbuffer:
This is another test with zfs_txg_timeout=10
and 1M snapshot:
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 17.1 kB/s, out @ 17.1 kB/s, 12.0 kB total, buffer 0% full18:14:12 43.6K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 32.0 kB total, buffer 0% full18:14:13 43.6K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 52.0 kB total, buffer 0% full18:14:14 43.6K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 72.0 kB total, buffer 0% full18:14:15 43.6K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 92.0 kB total, buffer 0% full18:14:16 43.6K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 112 kB total, buffer 0% full18:14:17 172K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 132 kB total, buffer 0% full18:14:18 172K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 152 kB total, buffer 0% full18:14:19 172K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 172 kB total, buffer 0% full18:14:20 172K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 192 kB total, buffer 0% full18:14:21 172K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 212 kB total, buffer 0% full18:14:22 172K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 232 kB total, buffer 0% full18:14:23 300K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 252 kB total, buffer 0% full18:14:24 300K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 272 kB total, buffer 0% full18:14:25 300K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 292 kB total, buffer 0% full18:14:26 300K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 312 kB total, buffer 0% full18:14:27 300K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 332 kB total, buffer 0% full18:14:28 300K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 352 kB total, buffer 0% full18:14:29 300K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 372 kB total, buffer 0% full18:14:30 429K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 392 kB total, buffer 0% full18:14:31 429K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 412 kB total, buffer 0% full18:14:32 429K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 432 kB total, buffer 0% full18:14:33 429K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 452 kB total, buffer 0% full18:14:34 429K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 472 kB total, buffer 0% full18:14:35 429K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 492 kB total, buffer 0% full18:14:36 557K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 512 kB total, buffer 0% full18:14:37 557K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 532 kB total, buffer 0% full18:14:38 557K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 552 kB total, buffer 0% full18:14:39 557K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 572 kB total, buffer 0% full18:14:40 557K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 592 kB total, buffer 0% full18:14:41 557K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 612 kB total, buffer 0% full18:14:42 685K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 632 kB total, buffer 0% full18:14:43 685K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 652 kB total, buffer 0% full18:14:44 685K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 672 kB total, buffer 0% full18:14:45 685K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 692 kB total, buffer 0% full18:14:46 685K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 712 kB total, buffer 0% full18:14:47 685K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 732 kB total, buffer 0% full18:14:48 685K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 752 kB total, buffer 0% full18:14:49 813K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 772 kB total, buffer 0% full18:14:50 813K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 792 kB total, buffer 0% full18:14:51 813K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 812 kB total, buffer 0% full18:14:52 813K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 832 kB total, buffer 0% full18:14:53 813K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 852 kB total, buffer 0% full18:14:54 813K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 872 kB total, buffer 0% full18:14:55 942K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 892 kB total, buffer 0% full18:14:56 942K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 912 kB total, buffer 0% full18:14:57 942K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 932 kB total, buffer 0% full18:14:58 942K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 952 kB total, buffer 0% full18:14:59 942K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 972 kB total, buffer 0% full18:15:00 942K tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 992 kB total, buffer 0% full18:15:01 942K tank/send@issue-5027
in @ 24.0 kB/s, out @ 24.0 kB/s, 1064 kB total, buffer 0% full
summary: 1070 kByte in 53.4 sec - average of 20.0 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 39.9 kB/s, out @ 39.9 kB/s, 28.0 kB total, buffer 0% full18:15:06 43.6K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 68.0 kB total, buffer 0% full18:15:07 43.6K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 108 kB total, buffer 0% full18:15:08 172K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 148 kB total, buffer 0% full18:15:09 172K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 188 kB total, buffer 0% full18:15:10 172K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 228 kB total, buffer 0% full18:15:11 300K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 268 kB total, buffer 0% full18:15:12 300K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 308 kB total, buffer 0% full18:15:13 300K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 348 kB total, buffer 0% full18:15:14 300K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 388 kB total, buffer 0% full18:15:15 429K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 428 kB total, buffer 0% full18:15:16 429K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 468 kB total, buffer 0% full18:15:17 429K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 508 kB total, buffer 0% full18:15:18 557K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 548 kB total, buffer 0% full18:15:19 557K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 588 kB total, buffer 0% full18:15:20 557K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 628 kB total, buffer 0% full18:15:21 685K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 668 kB total, buffer 0% full18:15:22 685K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 708 kB total, buffer 0% full18:15:23 685K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 748 kB total, buffer 0% full18:15:24 813K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 788 kB total, buffer 0% full18:15:25 813K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 828 kB total, buffer 0% full18:15:26 813K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 868 kB total, buffer 0% full18:15:27 942K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 908 kB total, buffer 0% full18:15:28 942K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 948 kB total, buffer 0% full18:15:29 942K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 988 kB total, buffer 0% full18:15:30 942K tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 1048 kB total, buffer 0% full
summary: 1070 kByte in 26.7 sec - average of 40.1 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 57.1 kB/s, out @ 57.1 kB/s, 40.0 kB total, buffer 0% full18:15:32 43.6K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 100 kB total, buffer 0% full18:15:33 172K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 160 kB total, buffer 0% full18:15:34 172K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 220 kB total, buffer 0% full18:15:35 300K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 280 kB total, buffer 0% full18:15:36 300K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 340 kB total, buffer 0% full18:15:37 300K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 400 kB total, buffer 0% full18:15:38 429K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 460 kB total, buffer 0% full18:15:39 429K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 520 kB total, buffer 0% full18:15:40 557K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 580 kB total, buffer 0% full18:15:41 557K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 640 kB total, buffer 0% full18:15:42 685K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 700 kB total, buffer 0% full18:15:43 685K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 760 kB total, buffer 0% full18:15:44 813K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 820 kB total, buffer 0% full18:15:45 813K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 880 kB total, buffer 0% full18:15:46 942K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 940 kB total, buffer 0% full18:15:47 942K tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 1060 kB total, buffer 0% full
summary: 1070 kByte in 17.8 sec - average of 60.1 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 56.0 kB total, buffer 0% full18:15:50 43.6K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 136 kB total, buffer 0% full18:15:51 172K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 216 kB total, buffer 0% full18:15:52 300K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 296 kB total, buffer 0% full18:15:53 300K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 376 kB total, buffer 0% full18:15:54 429K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 456 kB total, buffer 0% full18:15:55 429K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 536 kB total, buffer 0% full18:15:56 557K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 616 kB total, buffer 0% full18:15:57 685K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 696 kB total, buffer 0% full18:15:58 685K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 776 kB total, buffer 0% full18:15:59 813K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 856 kB total, buffer 0% full18:16:00 942K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 936 kB total, buffer 0% full18:16:01 942K tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 1056 kB total, buffer 0% full
summary: 1070 kByte in 13.4 sec - average of 80.1 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 97.0 kB/s, out @ 97.0 kB/s, 68.0 kB total, buffer 0% full18:16:04 43.6K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 168 kB total, buffer 0% full18:16:05 172K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 268 kB total, buffer 0% full18:16:06 300K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 368 kB total, buffer 0% full18:16:07 429K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 468 kB total, buffer 0% full18:16:08 557K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 568 kB total, buffer 0% full18:16:09 557K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 668 kB total, buffer 0% full18:16:10 685K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 768 kB total, buffer 0% full18:16:11 813K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 868 kB total, buffer 0% full18:16:12 942K tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s, 968 kB total, buffer 0% full18:16:13 942K tank/send@issue-5027
in @ 104 kB/s, out @ 104 kB/s, 1020 kB total, buffer 0% full
summary: 1070 kByte in 10.7 sec - average of 100 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 84.0 kB total, buffer 0% full18:16:15 172K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 204 kB total, buffer 0% full18:16:16 300K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 324 kB total, buffer 0% full18:16:17 300K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 444 kB total, buffer 0% full18:16:18 429K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 564 kB total, buffer 0% full18:16:19 557K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 684 kB total, buffer 0% full18:16:20 685K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 804 kB total, buffer 0% full18:16:21 813K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 924 kB total, buffer 0% full18:16:22 942K tank/send@issue-5027
in @ 120 kB/s, out @ 120 kB/s, 1044 kB total, buffer 0% full
summary: 1070 kByte in 8.9 sec - average of 120 kB/s
received 1.05MB stream in 9 seconds (119KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 137 kB/s, out @ 137 kB/s, 96.0 kB total, buffer 0% full18:16:24 172K tank/send@issue-5027
in @ 136 kB/s, out @ 136 kB/s, 236 kB total, buffer 0% full18:16:25 300K tank/send@issue-5027
in @ 136 kB/s, out @ 136 kB/s, 376 kB total, buffer 0% full18:16:26 429K tank/send@issue-5027
in @ 136 kB/s, out @ 136 kB/s, 516 kB total, buffer 0% full18:16:27 557K tank/send@issue-5027
in @ 136 kB/s, out @ 136 kB/s, 656 kB total, buffer 0% full18:16:28 685K tank/send@issue-5027
in @ 136 kB/s, out @ 136 kB/s, 796 kB total, buffer 0% full18:16:29 813K tank/send@issue-5027
in @ 136 kB/s, out @ 136 kB/s, 936 kB total, buffer 0% full18:16:30 942K tank/send@issue-5027
in @ 144 kB/s, out @ 144 kB/s, 1008 kB total, buffer 0% full
summary: 1070 kByte in 7.6 sec - average of 140 kB/s
received 1.05MB stream in 8 seconds (134KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 160 kB/s, out @ 160 kB/s, 112 kB total, buffer 0% full18:16:32 172K tank/send@issue-5027
in @ 160 kB/s, out @ 160 kB/s, 272 kB total, buffer 0% full18:16:33 300K tank/send@issue-5027
in @ 160 kB/s, out @ 160 kB/s, 432 kB total, buffer 0% full18:16:34 429K tank/send@issue-5027
in @ 160 kB/s, out @ 160 kB/s, 592 kB total, buffer 0% full18:16:35 685K tank/send@issue-5027
in @ 160 kB/s, out @ 160 kB/s, 752 kB total, buffer 0% full18:16:36 813K tank/send@issue-5027
in @ 160 kB/s, out @ 160 kB/s, 912 kB total, buffer 0% full18:16:37 942K tank/send@issue-5027
in @ 160 kB/s, out @ 160 kB/s, 992 kB total, buffer 0% full
summary: 1070 kByte in 6.7 sec - average of 160 kB/s
received 1.05MB stream in 7 seconds (153KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 177 kB/s, out @ 177 kB/s, 124 kB total, buffer 0% full18:16:39 172K tank/send@issue-5027
in @ 176 kB/s, out @ 176 kB/s, 304 kB total, buffer 0% full18:16:40 300K tank/send@issue-5027
in @ 176 kB/s, out @ 176 kB/s, 484 kB total, buffer 0% full18:16:41 557K tank/send@issue-5027
in @ 176 kB/s, out @ 176 kB/s, 664 kB total, buffer 0% full18:16:42 685K tank/send@issue-5027
in @ 176 kB/s, out @ 176 kB/s, 844 kB total, buffer 0% full18:16:43 942K tank/send@issue-5027
in @ 176 kB/s, out @ 176 kB/s, 1024 kB total, buffer 0% full
summary: 1070 kByte in 5.9 sec - average of 180 kB/s
received 1.05MB stream in 6 seconds (178KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME SENT SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 200 kB/s, out @ 200 kB/s, 140 kB total, buffer 0% full18:16:45 172K tank/send@issue-5027
in @ 200 kB/s, out @ 200 kB/s, 340 kB total, buffer 0% full18:16:46 429K tank/send@issue-5027
in @ 200 kB/s, out @ 200 kB/s, 540 kB total, buffer 0% full18:16:47 557K tank/send@issue-5027
in @ 200 kB/s, out @ 200 kB/s, 740 kB total, buffer 0% full18:16:48 813K tank/send@issue-5027
in @ 200 kB/s, out @ 200 kB/s, 940 kB total, buffer 0% full18:16:49 942K tank/send@issue-5027
in @ 200 kB/s, out @ 200 kB/s, 1040 kB total, buffer 0% full
summary: 1070 kByte in 5.3 sec - average of 200 kB/s
received 1.05MB stream in 6 seconds (178KB/sec)
OK
Versions tested:
kernel 4.7.2-hardened-gnu sys-kernel/spl-9999 from 05:19:18 2016-08-24 sys-fs/zfs-kmod-9999 from 05:25:13 2016-08-24 sys-fs/zfs-9999 from 21:56:57 2016-08-23 as well as 05:32:49 2016-08-26
@haasn i'm not very familiar with gentoo packaging, can you verify if your zfs version includes commit https://github.com/zfsonlinux/zfs/commit/e6d3a843d6ced970cbc74a3f809d744c30a7ec7c (OpenZFS 6393 - zfs receive a full send as a clone)?
On Illumos that change broke zfs recv
when the send stream contains the flag DRR_FLAG_FREERECORDS
(https://www.illumos.org/issues/6536). I'm not entirely sure this issue is caused by a conflicting flag, as they describe:
It turns out we added a flag with the same value and now we have a conflict. We will change our value to something else, in the meantime we need a way to generate streams that don't include DRR_FLAG_FREERECORDS.
but while debugging this issue on ZoL i noticed that when this problem occurs we are always in receive_freeobjects()
.
Actually the error message "cannot receive new filesystem stream: dataset does not exist" seems to be caused by dmu_object_next()
returning ENOENT
while we are iterating over freerecords in receive_freeobjects()
:
Breakpoint 2, dmu_object_next (os=0xfffffffffffffee0, objectp=0x4 <irq_stack_union+4>, hole=B_FALSE, txg=2) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_object.c:251
251 return (SET_ERROR(error));
(gdb) bt
#0 dmu_object_next (os=0xfffffffffffffee0, objectp=0x4 <irq_stack_union+4>, hole=B_FALSE, txg=2) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_object.c:251
#1 0xffffffffc07380fa in receive_freeobjects (rwa=0xfffffffffffffee0, drrfo=0x4 <irq_stack_union+4>) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_send.c:2087
#2 0xffffffffc0738208 in receive_process_record (rrd=<optimized out>, rwa=<optimized out>) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_send.c:2663
#3 receive_writer_thread (arg=0xffff88003a098600) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_send.c:2726
#4 0xffffffffc062e265 in thread_generic_wrapper (arg=0xffff8800339d97c0) at /usr/src/spl-spl-0.7.0-rc1/module/spl/spl-thread.c:61
#5 0xffffffff8109ac4f in kthread (_create=0xffff88003aa36940) at kernel/kthread.c:209
#6 0xffffffff815cc3f2 in ret_from_fork () at arch/x86/entry/entry_64.S:406
#7 0x0000000000000000 in ?? ()
(gdb) p error
$1 = 2
(gdb)
dmu_object_next()
was modified in https://github.com/zfsonlinux/zfs/commit/50c957f702ea6d08a634e42f73e8a49931dd8055 (_Implement largednode pool feature) but the issue manifests only since https://github.com/zfsonlinux/zfs/commit/e6d3a843d6ced970cbc74a3f809d744c30a7ec7c (OpenZFS 6393 - zfs receive a full send as a clone): this is because the return code ENOENT
was not previously used as an error condition in the for/loop in receive_freeobjects()
, whereas now it's used by the local variable next_err
.
TL;DR
https://github.com/zfsonlinux/zfs/commit/50c957f702ea6d08a634e42f73e8a49931dd8055 makes dmu_object_next()
returns ENOENT
when receiving freeobjects.
https://github.com/zfsonlinux/zfs/commit/e6d3a843d6ced970cbc74a3f809d744c30a7ec7c use the previous ENOENT
as an error condition and report it to userspace when receiving freeobjects.
@haasn i'm not very familiar with gentoo packaging, can you verify if your zfs version includes commit e6d3a84 (OpenZFS 6393 - zfs receive a full send as a clone)?
The version I am currently testing with is the git tag zfs-0.7.0-rc1 (as well as spl-0.7.0-rc1). According to the webpage you linked, that tag includes the commitin question.
When I made this bug report, I was still on commit https://github.com/zfsonlinux/zfs/commit/9907cc1cc, which also includes the suspect commit.
There are still some questions that need to be answered (why didn't the ZFS test suite catch this? why is this only happening sometimes? why is dmu_object_next()
returing ENOENT
?): until we get this figured out you can probably try to run a previous commit to avoid this bug and still access your large_dnode-enabled data.
@loli10K I gave e6d3a84^ a try (with spl on 0.7.0-rc1 still), but it just spammed my kmesg with null pointer dereference errors while trying to boot. Not sure what that was about, but I'm back on zfs-0.7.0-rc1 for now.
I can wait for a proper fix for this issue.
@haasn wow, sorry about that, a slightly older commit shouldn't be doing that ...
I made more tests and i can almost certainly confirm this is a race condition: zfs recv
works with large_dnode active as long as we can push all the FREEOBJECT records from the send stream before tgx_sync comes in and sets ds->ds_feature_inuse[SPA_FEATURE_LARGE_DNODE]
, from then on:
dmu_object_next()
starts executing new code inside the if
condition https://github.com/zfsonlinux/zfs/blame/zfs-0.7.0-rc1/module/zfs/dmu_object.c#L244dmu_object_info()
ENOENT
and reports it back to the callerreceive_freeobjects()
zfs recv
fails with "dataset does not exist"The ZFS test suite probably doesn't catch this because executes the tests too fast to fail.
Unfortunately i don't know how to fix this properly, we'll have to wait for someone with more experience in the ZFS codebase to pick this up.
What's the status of this? Is there a fix in 0.7.0-rc2 or master that I could test? (Asking since it's part of the 0.7.0 milestone)
I don't really understand if this is an upstream issue (OpenZFS) or just a zfsonlinux issue.
@haasn i'm positive this is a ZoL-only issue since the code hasn't been ported to upstream (yet): https://www.illumos.org/issues/7432.
Not sure if I'm having this exact same issue or not. I'm running the December 1st commits, b0319c1faae5a77e553ae74bf899efbc6629674d, on two machines. One is a Fedora 20 with kernel 3.19.8-100.fc20.x86_64 and the other is Fedora 23 with kernel 4.8.10-100.fc23.x86_64.
All features are enabled on the sender and receivers.
I'm sending snapshots of 4 filesystems: databases/mysql/bin-log, databases/mysql/data, databases/mysql/innodb, and databases/mysql/logs. All of them complete except for "databases/mysql/data" which dies about 1.5G into a 7G transfer with "cannot receive new filesystem stream: dataset does not exist".
Unlike haasn's problem, this is consistent. Scrubs of both pools show they are clean, but I cannot receive the filesystem on any machine that I send it to.
Unlike haasn's problem, this is consistent. Scrubs of both pools show they are clean, but I cannot receive the filesystem on any machine that I send it to.
My problem is consistent for large datasets too. Since it's some sort of race condition, it depends on the dataset size. 1.5 GB is definitely above the threshold where it would always fail.
I have tried a number of different invocations and permutations of the following yet the result is always the same:
(Note:
backup
was mounted with altroot=/mnt/backup, but omitting that parmeter when importing does not help)I can try sending them one at a time instead of using -R:
I can try sending the
--head--
filesystem directly, but this fails becausezarathustra/ROOT/gentoo
is in use.I can try many other invocations: (I'll omit the output, but the error is always the exact same)
I have gone through all of the
zpool
features and double-checked that the zpool configuration is the same for both pools, as well as the creation-time properties on the root dataset.I can try sending other datasets, but every dataset that contains actual files fail, while the non-mountable datasets (like
zarathustra/ROOT
work fine):(These all fail with similar errors, differng only in the name)
However, confusingly, I can make a new dataset and send it just fine:
I don't really understand what's going on here. I can copy/paste invocations from guides and they simply fail to work on my end.
Versions tested:
I don't know how to create a minimal reproducer, since I can't even reproduce it myself except on the datasets I already have (for whatever reason).
zpool status
shows no errors.