openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.63k stars 1.75k forks source link

zfs receive does not work on some datasets #5027

Closed haasn closed 7 years ago

haasn commented 8 years ago

I have tried a number of different invocations and permutations of the following yet the result is always the same:

λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fsv backup
receiving full stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-0300 into backup@zfs-auto-snap_hourly-2016-08-25-0300
received 44.2KB stream in 1 seconds (44.2KB/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-0400 into backup@zfs-auto-snap_hourly-2016-08-25-0400
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-0500 into backup@zfs-auto-snap_hourly-2016-08-25-0500
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-0600 into backup@zfs-auto-snap_hourly-2016-08-25-0600
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-0700 into backup@zfs-auto-snap_hourly-2016-08-25-0700
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-0800 into backup@zfs-auto-snap_hourly-2016-08-25-0800
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-0900 into backup@zfs-auto-snap_hourly-2016-08-25-0900
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1000 into backup@zfs-auto-snap_hourly-2016-08-25-1000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1100 into backup@zfs-auto-snap_hourly-2016-08-25-1100
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1200 into backup@zfs-auto-snap_hourly-2016-08-25-1200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1300 into backup@zfs-auto-snap_hourly-2016-08-25-1300
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1400 into backup@zfs-auto-snap_hourly-2016-08-25-1400
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1500 into backup@zfs-auto-snap_hourly-2016-08-25-1500
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1600 into backup@zfs-auto-snap_hourly-2016-08-25-1600
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1700 into backup@zfs-auto-snap_hourly-2016-08-25-1700
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1800 into backup@zfs-auto-snap_hourly-2016-08-25-1800
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-1900 into backup@zfs-auto-snap_hourly-2016-08-25-1900
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-2000 into backup@zfs-auto-snap_hourly-2016-08-25-2000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-2100 into backup@zfs-auto-snap_hourly-2016-08-25-2100
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-2200 into backup@zfs-auto-snap_hourly-2016-08-25-2200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_daily-2016-08-25-2200 into backup@zfs-auto-snap_daily-2016-08-25-2200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-25-2300 into backup@zfs-auto-snap_hourly-2016-08-25-2300
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-26-0000 into backup@zfs-auto-snap_hourly-2016-08-26-0000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-26-0100 into backup@zfs-auto-snap_hourly-2016-08-26-0100
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_frequent-2016-08-26-0145 into backup@zfs-auto-snap_frequent-2016-08-26-0145
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_frequent-2016-08-26-0200 into backup@zfs-auto-snap_frequent-2016-08-26-0200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@zfs-auto-snap_hourly-2016-08-26-0200 into backup@zfs-auto-snap_hourly-2016-08-26-0200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra@backup-2016-08-26 into backup@backup-2016-08-26
received 312B stream in 1 seconds (312B/sec)
receiving full stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-0300 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-0300
received 42.8KB stream in 1 seconds (42.8KB/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-0400 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-0400
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-0500 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-0500
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-0600 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-0600
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-0700 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-0700
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-0800 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-0800
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-0900 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-0900
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1000 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1100 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1100
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1200 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1300 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1300
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1400 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1400
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1500 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1500
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1600 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1600
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1700 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1700
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1800 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1800
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-1900 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-1900
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-2000 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-2000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-2100 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-2100
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-2200 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-2200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_daily-2016-08-25-2200 into backup/ROOT@zfs-auto-snap_daily-2016-08-25-2200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-25-2300 into backup/ROOT@zfs-auto-snap_hourly-2016-08-25-2300
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-26-0000 into backup/ROOT@zfs-auto-snap_hourly-2016-08-26-0000
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-26-0100 into backup/ROOT@zfs-auto-snap_hourly-2016-08-26-0100
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_frequent-2016-08-26-0145 into backup/ROOT@zfs-auto-snap_frequent-2016-08-26-0145
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_frequent-2016-08-26-0200 into backup/ROOT@zfs-auto-snap_frequent-2016-08-26-0200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@zfs-auto-snap_hourly-2016-08-26-0200 into backup/ROOT@zfs-auto-snap_hourly-2016-08-26-0200
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zarathustra/ROOT@backup-2016-08-26 into backup/ROOT@backup-2016-08-26
received 312B stream in 1 seconds (312B/sec)
receiving full stream of zarathustra/ROOT/gentoo@zfs-auto-snap_hourly-2016-08-25-0300 into backup/ROOT/gentoo@zfs-auto-snap_hourly-2016-08-25-0300
cannot receive new filesystem stream: dataset does not exist
λ zfs list -r backup
NAME                                               USED  AVAIL  REFER  MOUNTPOINT
backup                                             279M  3.51T    96K  none
backup/ROOT                                        276M  3.51T    96K  none
backup/ROOT/gentoo                                 276M  3.51T   276M  /mnt/backup

(Note: backup was mounted with altroot=/mnt/backup, but omitting that parmeter when importing does not help)

I can try sending them one at a time instead of using -R:

λ zfs send -Le zarathustra@backup-2016-08-26 | zfs receive -Fuv backup
receiving full stream of zarathustra@backup-2016-08-26 into backup@backup-2016-08-26
received 44.2KB stream in 1 seconds (44.2KB/sec)

λ zfs send -Le zarathustra/ROOT@backup-2016-08-26 | zfs receive -Fuv backup/ROOT
receiving full stream of zarathustra/ROOT@backup-2016-08-26 into backup/ROOT@backup-2016-08-26
received 42.8KB stream in 1 seconds (42.8KB/sec)

λ zfs send -Le zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -Fuv backup/ROOT/gentoo
receiving full stream of zarathustra/ROOT/gentoo@backup-2016-08-26 into backup/ROOT/gentoo@backup-2016-08-26
cannot receive new filesystem stream: dataset does not exist

I can try sending the --head-- filesystem directly, but this fails because zarathustra/ROOT/gentoo is in use.

I can try many other invocations: (I'll omit the output, but the error is always the exact same)

λ zfs send -Lep zarathustra@backup-2016-08-26 | zfs receive -Fsuv backup
λ zfs send -Lep zarathustra@backup-2016-08-26 | zfs receive -Fsv backup
λ zfs send -Lep zarathustra@backup-2016-08-26 | zfs receive -suv backup
λ zfs send -Lep zarathustra/ROOT@backup-2016-08-26 | zfs receive -Fsuv backup/ROOT
λ zfs send -Lep zarathustra/ROOT@backup-2016-08-26 | zfs receive -Fsv backup/ROOT
λ zfs send -Lep zarathustra/ROOT@backup-2016-08-26 | zfs receive -sv backup/ROOT
λ zfs send -Lep zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -Fsuv backup/ROOT/gentoo
λ zfs send -Lep zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -Fsv backup/ROOT/gentoo
λ zfs send -Lep zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -sv backup/ROOT/gentoo
λ zfs send -Le zarathustra@backup-2016-08-26 | zfs receive -Fsuv backup
λ zfs send -Le zarathustra/ROOT@backup-2016-08-26 | zfs receive -Fsuv backup/ROOT
λ zfs send -Le zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -Fsuv backup/ROOT/gentoo
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -dFsv backup
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -duv backup
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -duv backup/test
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fdsuv backup
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fduv backup
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fduv backup/2016-08-26
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fsv backup
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fuv backup
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fuv backup/2016-08-26
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -Fuv backup/test
λ zfs send -RLep zarathustra@backup-2016-08-26 | zfs receive -uv backup/test
λ zfs send -RLe zarathustra@backup-2016-08-26 | zfs receive -Fdsuv backup
λ zfs send -RLe zarathustra@backup-2016-08-26 | zfs receive -Fsuv backup
λ zfs send -RLe zarathustra@backup-2016-08-26 | zfs receive -Fsuv -n backup
λ zfs send -RLe zarathustra@backup-2016-08-26 | zfs receive -suv -n backup
λ zfs send zarathustra/ROOT/gentoo@backup-2016-08-26 -Le | zfs receive -v backup/test
λ zfs send zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -Fuv backup/ROOT/gentoo
λ zfs send zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -Fv backup/ROOT/gentoo
λ zfs send zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs receive -v backup/ROOT/gentoo
λ zfs send -R zarathustra/ROOT/gentoo@backup-2016-08-26 | zfs recv -u backup/testroot

I have gone through all of the zpool features and double-checked that the zpool configuration is the same for both pools, as well as the creation-time properties on the root dataset.

I can try sending other datasets, but every dataset that contains actual files fail, while the non-mountable datasets (like zarathustra/ROOT work fine):

λ zfs send -R zarathustra/HOME/nand/cms@backup-2016-08-26 | zfs recv -uv backup/testcms
λ zfs send -R zarathustra/MEDIA/C@backup-2016-08-26 | zfs recv -uv backup/testC
λ zfs send -R zarathustra/ROOT/gentoo/var/db@backup-2016-08-26 | zfs recv -uv backup/testdb

(These all fail with similar errors, differng only in the name)

However, confusingly, I can make a new dataset and send it just fine:

λ zfs create zarathustra/MEDIA/test
λ cd /z/test
λ echo hello > world
λ zfs snapshot zarathustra/MEDIA/test@test
λ zfs send -R zarathustra/MEDIA/test@test | zfs recv -u backup/testbackup
λ zfs list -r backup
NAME                USED  AVAIL  REFER  MOUNTPOINT
backup             1.10M  3.51T    96K  none
backup/testbackup    96K  3.51T    96K  none
λ zfs list -rt snapshot backup
NAME                     USED  AVAIL  REFER  MOUNTPOINT
backup/testbackup@test      0      -    96K  -

I don't really understand what's going on here. I can copy/paste invocations from guides and they simply fail to work on my end.

Versions tested:

I don't know how to create a minimal reproducer, since I can't even reproduce it myself except on the datasets I already have (for whatever reason). zpool status shows no errors.

haasn commented 8 years ago

Note: I don't need to involve a second zpool at all; I get the same error when transferring to the same pool.

I also found out that not all of my datasets are affected, I just happened to get unlucky when I tested a few at random. I'm still not sure what distinguishes the ones that are from the ones that aren't affected, but I have a feeling it may perhaps be related to xattrs, ACLs, paxmarks, SELinux contexts or something of that nature, because it only seems to affect sufficiently nontrivial datasets. (I have feature@large_dnode enabled, acltype=posixacl and xattr=sa)

That said, if I try reproducing it by copying over /usr/bin/sudo and /bin/ping into the ‘test’ dataset and paxmarking them, and it still works fine. (And I no longer have a SELinux kernel or related userland installed)

I've also tried dumping 1G of /dev/urandom into the ‘test’ dataset, and it still transmits fine. I'm really not sure what else I should be trying to try and reproduce it on a clean dataset. I can't find a pattern among the ones that are affected - they're both small and large and contain all sorts of different files, yet some seem to get sent correctly where others fail.

Edit: Turns out that actually, some of the datasets which failed to get received on a different pool actually work fine on the same pool - although some fail even on the same pool. It also seems like there's a difference between -R and non--R here, some fail with -R but works without it, which is contrast to the behavior I saw in my first post where it failed even without -R - although that may have been because I was testing on one of the datasets that always seems to fail in my tests (zarathustra/ROOT/gentoo).

I'm suffering from a combinatorial explosion of variables and different cases here, and I'm struggling to even document what I'm observing. I'll continue trying to reproduce the issue on a clean dataset when I get the time/energy.

haasn commented 8 years ago

Problem seems to have solved itself with an update.

haasn commented 8 years ago

Hmm, it still fails when trying to do it across zpools, exact same error. I can copy a dataset from one place on my zpool to another, but doing it from my main pool to my backup pool still fails.

haasn commented 8 years ago

That said, it again seems to work for some datasets but not others.

I have two seemingly identical (as per ‘zfs get all’) datasets in the same location, differing only by name, content and mountpoint - sending one works, and sending the other does not.

(The one that fails is the larger of the two)

haasn commented 8 years ago

@kpande I don't see how that's relevant:

  1. I turned off my zfs-auto-snapshot script before doing all of these tests
  2. The snapshot I'm using is one I created manually, rather than one of the auto-snapshots. (Although it seems to make no difference)
  3. The script would be incapable of deleting a snapshot that's in use either way, unless I misunderstand something
haasn commented 8 years ago

That only invalidates 3., not 1. or 2.

I tested it and zfs hold / zfs release made no difference. Also, I can reproduce each of these failures consistently - i.e. if a snapshot fails to transfer, it will always fail. It's just whether or not any given snapshot fails or not that seems to be arbitrary.

I'm currently in the process of running a script that tries sending every snapshot one by one (i.e. not recursively), so I can get a list of all that work and all that fail and compare them ony-be-one. I'll report back when it finishes.

haasn commented 8 years ago

I'll report back when it finishes.

It finished. Out of my 48 datasets, only 20 of them were zfs receive-able. The rest failed with the same error as described multiple times.

My invocation for this test was something like:

for x in DATASETS; do
    zfs send -Le ${x}@test-2016-09-28 | zfs receive -Fvu backup/${x//\//-}
done

(The shell replacement was to make sure datasets ended up in a single flat level instead of using a hierarachy, because the parent datasets fail in some cases)

I see no discernable pattern among the datasets that survived the zfs receive, except that they tend to be very small datasets. Out of 1.7T data, only 103G survived. The largest dataset that succeeded was 759M, but the majority were well under 500MB. The smallest dataset that failed was 391M, but the majority were well over 1G.

The number of files in the dataset seems to be uncorrelated, as is the type of file (i.e. SELinux tags, ACLs, xattrs, etc. seem to have no effect). Really, the only pattern I see is that large datasets seem to fail.

If I had to guess, I would assume this has something to do with the large_dnode feature, because that is the only feature I am using which has not made it into any stable release. (Incidentally: The enabled feature lists of both pools are identical, as are the properties on the root dataset)

Is there some easy way I can test to see if one of the affected datasets would transfer fine it it was not using large dnodes? I can set the dnodesize to something else but that won't affect already-written data..

haasn commented 8 years ago

Is there some easy way I can test to see if one of the affected datasets would transfer fine it it was not using large dnodes? I can set the dnodesize to something else but that won't affect already-written data..

Okay, I did a test like this:

  1. Create new temporary dataset
  2. Rsync files from the smallest failing dataset (389M) into the temporary dataset
  3. Create a @test snapshot
  4. Try sending this

It failed.

I then tried repeating this but setting ‘dnodesize=legacy’ when creating the temporary dataset. It still failed, so dnodesize=legacy does not improve the situation.

I'll try bisecting the dataset to see if it's a particular file that's causing the failure.

Update: I could not bisect it to any individual file, but I did reach a point where if I start deleting stuff, it will suddenly succeed. (For example, there's one file which, if deleted, will immediately make it succeed again. As soon as I cp over this exact file, it will fail to work)

Here is a find . of my directory structure: https://0x0.st/QwC.txt

Moving ./a/b/c/d to a single top-level location (e.g. mv ./a/b/c/d ./foomakes it work. As does deleting most/some of the subfolders (0,1,2,3 ...) inside ./a/b/c/d. To provide an example of a file deleting,./a/b/c/d/D/FA/3366Ad01` makes it work. (This file is 3MB large and contains a regular .gif image)

Update 2: I've trimmed it down to the point where if I delete the smallest directory inside ./a/b/c/d, it will fail at first and then succeed a few seconds later. (And continue succeeding). If I rsync over the directory again, it will fail again.

Could it be the case that the problem here is that there's too much data nested inside a single directory structure, or something like that? (Since mving the directories around makes it work again..). That wouldn't explain why my 759MB dataset succeeded, though, although that dataset has its size split up over multiple top-level directories (and the majority of those 700MB are a single iso image)

Update 3: I could move the ‘d’ folder directly into ‘./a/d’ and still reproduce the issue. I could also remove (now empty) the ‘./a/b/c’ folder. At this point, whether it succeeds or not seems to be somewhat change. Little things like mkdir and rmdir seem to flip it, and it will sometimes revert back to succeeding after waiting a few seconds. (Also, creating/destroying the snapshot itself during the testing process may contribute to this)

It feels like I've reached some critical point where the ‘size’ is almost at the threshold where it will trigger whatever behavior breaks the zfs receive. Here are my current stats for what it's worth, and it is currently failing at the time of measuring:

zfs get all: https://0x0.st/QxH.txt find .: https://0x0.st/QxX.txt zpool get all: https://0x0.st/Qx8.txt https://0x0.st/QxK.txt (respectively)

And ny test command was this:

zfs snapshot zarathustra/MISC/test@test; zfs send -Lev zarathustra/MISC/test@test | zfs receive -Fvu backup/test-bisect && echo SUCCESS && zfs destroy -r backup/test-bisect; zfs destroy zarathustra/MISC/test@test

After I delete an empty dir to make it succeed again, my zfs get all reports no change.

haasn commented 8 years ago

I tried re-running the previous test with -o xattr=off -o dnsize=legacy -o compression=off -o acltype=off for some extra sanity. (It still failed)

Deleting some smallish (~1MB) directory still caused it to succeed again, even though the total size on-disk was now several MB larger than before (69 MB instead of 64 MB) because compression was turned off. So at any rate, that doesn't seem to have an effect, whatever it's worth.

I'm at a loss for what else to debug here, or how to get more information about the zfs receive process and what exactly it's doing that fails. Also, just for sanity, if I record a zfs send stream into a file and zfs receive that, my results will always be identical. (In other words, a single send stream will either always work or always fail)

haasn commented 8 years ago

Interesting update: It still happens if I NULL out all of the files (while preserving the filesizes). In other words, I can send you a “sanitized” zfs send stream, and I can also compress it.

I've uploaded the compressed zfs send stream here: https://0x0.st/Q36.xz (decompresses to 80 MB)

Trying to zfs receive this stream fails on my end:

λ zfs receive backup/test-bisect < /mem/package-fail 
cannot open 'backup/test-bisect': dataset does not exist
cannot receive new filesystem stream: dataset does not exist

It would be great if somebody else could try and reproduce!

In case it helps debugging, here is a second zfs send stream, this time one that works (I deleted a few directories): https://0x0.st/QYh.xz

loli10K commented 8 years ago

@haasn I just tried to receive both send streams on a debian box running the current master on a newly created pool and neither of those seems to work for me; can you double-check if your system can actually receive the stream from the second link?

haasn commented 8 years ago

@loli10K It seems to work.. sometimes?

λ wget https://0x0.st/QYh.xz
λ unxz QYh.xz
λ zfs receive -vu backup/test < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test@test
cannot open 'backup/test': dataset does not exist
cannot receive new filesystem stream: dataset does not exist

λ zfs receive -Fvu backup/test < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test@test
received 69.3MB stream in 1 seconds (69.3MB/sec)

λ zfs receive -Fvu backup/test2 < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test2@test
received 69.3MB stream in 1 seconds (69.3MB/sec)

λ zfs receive -vu backup/test2 < QYh
cannot receive new filesystem stream: destination 'backup/test2' exists
must specify -F to overwrite it

λ zfs receive -vu backup/test3 < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test3@test
received 69.3MB stream in 1 seconds (69.3MB/sec)

λ zfs destroy -rv backup/test
will destroy backup/test@test
will destroy backup/test

λ zfs destroy -rv backup/test2
will destroy backup/test2@test
will destroy backup/test2

λ zfs destroy -rv backup/test3
will destroy backup/test3@test
will destroy backup/test3

λ zfs receive -vu backup/test < QYh
receiving full stream of zarathustra/MISC/test@test into backup/test@test
received 69.3MB stream in 1 seconds (69.3MB/sec)

Also,

backup  feature@async_destroy       enabled                     local
backup  feature@empty_bpobj         active                      local
backup  feature@lz4_compress        active                      local
backup  feature@spacemap_histogram  active                      local
backup  feature@enabled_txg         active                      local
backup  feature@hole_birth          disabled                    local
backup  feature@extensible_dataset  active                      local
backup  feature@embedded_data       active                      local
backup  feature@bookmarks           disabled                    local
backup  feature@filesystem_limits   enabled                     local
backup  feature@large_blocks        enabled                     local
backup  feature@large_dnode         active                      local
loli10K commented 8 years ago

@haasn sometimes sounds about right. After a little more testing it seems this issue is more time related than size related (incidentally, if you increase the size of the sent zfs stream it will increase receive time too).

root@debian-zfs:~# zfs send -Le tank/send@issue-5027 > file
root@debian-zfs:~# echo 2 > /sys/module/zfs/parameters/zfs_txg_timeout
root@debian-zfs:~# while true; sleep 2 ; do cat file | zfs receive -Fuv tank/recv ; zfs destroy tank/recv -r ; done
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
^C
root@debian-zfs:~# while true; sleep 1 ; do cat file | zfs receive -Fuv tank/recv ; zfs destroy tank/recv -r ; done
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'tank/recv': dataset does not exist
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'tank/recv': dataset does not exist
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
cannot open 'tank/recv': dataset does not exist
^C
root@debian-zfs:~# while true; sleep 3 ; do cat file | zfs receive -Fuv tank/recv ; zfs destroy tank/recv -r ; done
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
received 38.1MB stream in 1 seconds (38.1MB/sec)
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
^C

Can you do more testing with different zfs_txg_timeout/sleep values on your system?

EDIT: probably it's both size and time related, meaning that the issue seems to manifest when we can't receive the zfs stream in a single txg (it either gets switched by a timeout or too much dirty data i guess?). For instance the same exact snapshot dumped to a file can be consistently received as long as is small enough, but can still be forced in an "cannot receive new filesystem stream: dataset does not exist" condition if we add a limit the bandwidth at which we receive the data with mbuffer:

This is another test with zfs_txg_timeout=10 and 1M snapshot:

full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 17.1 kB/s, out @ 17.1 kB/s, 12.0 kB total, buffer   0% full18:14:12   43.6K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 32.0 kB total, buffer   0% full18:14:13   43.6K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 52.0 kB total, buffer   0% full18:14:14   43.6K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 72.0 kB total, buffer   0% full18:14:15   43.6K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s, 92.0 kB total, buffer   0% full18:14:16   43.6K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  112 kB total, buffer   0% full18:14:17    172K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  132 kB total, buffer   0% full18:14:18    172K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  152 kB total, buffer   0% full18:14:19    172K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  172 kB total, buffer   0% full18:14:20    172K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  192 kB total, buffer   0% full18:14:21    172K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  212 kB total, buffer   0% full18:14:22    172K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  232 kB total, buffer   0% full18:14:23    300K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  252 kB total, buffer   0% full18:14:24    300K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  272 kB total, buffer   0% full18:14:25    300K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  292 kB total, buffer   0% full18:14:26    300K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  312 kB total, buffer   0% full18:14:27    300K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  332 kB total, buffer   0% full18:14:28    300K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  352 kB total, buffer   0% full18:14:29    300K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  372 kB total, buffer   0% full18:14:30    429K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  392 kB total, buffer   0% full18:14:31    429K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  412 kB total, buffer   0% full18:14:32    429K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  432 kB total, buffer   0% full18:14:33    429K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  452 kB total, buffer   0% full18:14:34    429K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  472 kB total, buffer   0% full18:14:35    429K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  492 kB total, buffer   0% full18:14:36    557K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  512 kB total, buffer   0% full18:14:37    557K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  532 kB total, buffer   0% full18:14:38    557K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  552 kB total, buffer   0% full18:14:39    557K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  572 kB total, buffer   0% full18:14:40    557K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  592 kB total, buffer   0% full18:14:41    557K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  612 kB total, buffer   0% full18:14:42    685K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  632 kB total, buffer   0% full18:14:43    685K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  652 kB total, buffer   0% full18:14:44    685K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  672 kB total, buffer   0% full18:14:45    685K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  692 kB total, buffer   0% full18:14:46    685K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  712 kB total, buffer   0% full18:14:47    685K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  732 kB total, buffer   0% full18:14:48    685K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  752 kB total, buffer   0% full18:14:49    813K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  772 kB total, buffer   0% full18:14:50    813K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  792 kB total, buffer   0% full18:14:51    813K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  812 kB total, buffer   0% full18:14:52    813K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  832 kB total, buffer   0% full18:14:53    813K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  852 kB total, buffer   0% full18:14:54    813K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  872 kB total, buffer   0% full18:14:55    942K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  892 kB total, buffer   0% full18:14:56    942K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  912 kB total, buffer   0% full18:14:57    942K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  932 kB total, buffer   0% full18:14:58    942K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  952 kB total, buffer   0% full18:14:59    942K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  972 kB total, buffer   0% full18:15:00    942K   tank/send@issue-5027
in @ 16.0 kB/s, out @ 16.0 kB/s,  992 kB total, buffer   0% full18:15:01    942K   tank/send@issue-5027
in @ 24.0 kB/s, out @ 24.0 kB/s, 1064 kB total, buffer   0% full
summary: 1070 kByte in 53.4 sec - average of 20.0 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 39.9 kB/s, out @ 39.9 kB/s, 28.0 kB total, buffer   0% full18:15:06   43.6K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 68.0 kB total, buffer   0% full18:15:07   43.6K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  108 kB total, buffer   0% full18:15:08    172K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  148 kB total, buffer   0% full18:15:09    172K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  188 kB total, buffer   0% full18:15:10    172K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  228 kB total, buffer   0% full18:15:11    300K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  268 kB total, buffer   0% full18:15:12    300K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  308 kB total, buffer   0% full18:15:13    300K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  348 kB total, buffer   0% full18:15:14    300K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  388 kB total, buffer   0% full18:15:15    429K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  428 kB total, buffer   0% full18:15:16    429K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  468 kB total, buffer   0% full18:15:17    429K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  508 kB total, buffer   0% full18:15:18    557K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  548 kB total, buffer   0% full18:15:19    557K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  588 kB total, buffer   0% full18:15:20    557K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  628 kB total, buffer   0% full18:15:21    685K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  668 kB total, buffer   0% full18:15:22    685K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  708 kB total, buffer   0% full18:15:23    685K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  748 kB total, buffer   0% full18:15:24    813K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  788 kB total, buffer   0% full18:15:25    813K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  828 kB total, buffer   0% full18:15:26    813K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  868 kB total, buffer   0% full18:15:27    942K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  908 kB total, buffer   0% full18:15:28    942K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  948 kB total, buffer   0% full18:15:29    942K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s,  988 kB total, buffer   0% full18:15:30    942K   tank/send@issue-5027
in @ 40.0 kB/s, out @ 40.0 kB/s, 1048 kB total, buffer   0% full
summary: 1070 kByte in 26.7 sec - average of 40.1 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 57.1 kB/s, out @ 57.1 kB/s, 40.0 kB total, buffer   0% full18:15:32   43.6K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  100 kB total, buffer   0% full18:15:33    172K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  160 kB total, buffer   0% full18:15:34    172K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  220 kB total, buffer   0% full18:15:35    300K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  280 kB total, buffer   0% full18:15:36    300K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  340 kB total, buffer   0% full18:15:37    300K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  400 kB total, buffer   0% full18:15:38    429K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  460 kB total, buffer   0% full18:15:39    429K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  520 kB total, buffer   0% full18:15:40    557K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  580 kB total, buffer   0% full18:15:41    557K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  640 kB total, buffer   0% full18:15:42    685K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  700 kB total, buffer   0% full18:15:43    685K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  760 kB total, buffer   0% full18:15:44    813K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  820 kB total, buffer   0% full18:15:45    813K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  880 kB total, buffer   0% full18:15:46    942K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s,  940 kB total, buffer   0% full18:15:47    942K   tank/send@issue-5027
in @ 55.9 kB/s, out @ 55.9 kB/s, 1060 kB total, buffer   0% full
summary: 1070 kByte in 17.8 sec - average of 60.1 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 56.0 kB total, buffer   0% full18:15:50   43.6K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  136 kB total, buffer   0% full18:15:51    172K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  216 kB total, buffer   0% full18:15:52    300K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  296 kB total, buffer   0% full18:15:53    300K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  376 kB total, buffer   0% full18:15:54    429K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  456 kB total, buffer   0% full18:15:55    429K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  536 kB total, buffer   0% full18:15:56    557K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  616 kB total, buffer   0% full18:15:57    685K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  696 kB total, buffer   0% full18:15:58    685K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  776 kB total, buffer   0% full18:15:59    813K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  856 kB total, buffer   0% full18:16:00    942K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s,  936 kB total, buffer   0% full18:16:01    942K   tank/send@issue-5027
in @ 79.9 kB/s, out @ 79.9 kB/s, 1056 kB total, buffer   0% full
summary: 1070 kByte in 13.4 sec - average of 80.1 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @ 97.0 kB/s, out @ 97.0 kB/s, 68.0 kB total, buffer   0% full18:16:04   43.6K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  168 kB total, buffer   0% full18:16:05    172K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  268 kB total, buffer   0% full18:16:06    300K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  368 kB total, buffer   0% full18:16:07    429K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  468 kB total, buffer   0% full18:16:08    557K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  568 kB total, buffer   0% full18:16:09    557K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  668 kB total, buffer   0% full18:16:10    685K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  768 kB total, buffer   0% full18:16:11    813K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  868 kB total, buffer   0% full18:16:12    942K   tank/send@issue-5027
in @ 95.9 kB/s, out @ 95.9 kB/s,  968 kB total, buffer   0% full18:16:13    942K   tank/send@issue-5027
in @  104 kB/s, out @  104 kB/s, 1020 kB total, buffer   0% full
summary: 1070 kByte in 10.7 sec - average of  100 kB/s
cannot open 'tank/recv': dataset does not exist
cannot receive new filesystem stream: dataset does not exist
KO
cannot open 'tank/recv': dataset does not exist
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @  120 kB/s, out @  120 kB/s, 84.0 kB total, buffer   0% full18:16:15    172K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s,  204 kB total, buffer   0% full18:16:16    300K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s,  324 kB total, buffer   0% full18:16:17    300K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s,  444 kB total, buffer   0% full18:16:18    429K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s,  564 kB total, buffer   0% full18:16:19    557K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s,  684 kB total, buffer   0% full18:16:20    685K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s,  804 kB total, buffer   0% full18:16:21    813K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s,  924 kB total, buffer   0% full18:16:22    942K   tank/send@issue-5027
in @  120 kB/s, out @  120 kB/s, 1044 kB total, buffer   0% full
summary: 1070 kByte in  8.9 sec - average of  120 kB/s
received 1.05MB stream in 9 seconds (119KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @  137 kB/s, out @  137 kB/s, 96.0 kB total, buffer   0% full18:16:24    172K   tank/send@issue-5027
in @  136 kB/s, out @  136 kB/s,  236 kB total, buffer   0% full18:16:25    300K   tank/send@issue-5027
in @  136 kB/s, out @  136 kB/s,  376 kB total, buffer   0% full18:16:26    429K   tank/send@issue-5027
in @  136 kB/s, out @  136 kB/s,  516 kB total, buffer   0% full18:16:27    557K   tank/send@issue-5027
in @  136 kB/s, out @  136 kB/s,  656 kB total, buffer   0% full18:16:28    685K   tank/send@issue-5027
in @  136 kB/s, out @  136 kB/s,  796 kB total, buffer   0% full18:16:29    813K   tank/send@issue-5027
in @  136 kB/s, out @  136 kB/s,  936 kB total, buffer   0% full18:16:30    942K   tank/send@issue-5027
in @  144 kB/s, out @  144 kB/s, 1008 kB total, buffer   0% full
summary: 1070 kByte in  7.6 sec - average of  140 kB/s
received 1.05MB stream in 8 seconds (134KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @  160 kB/s, out @  160 kB/s,  112 kB total, buffer   0% full18:16:32    172K   tank/send@issue-5027
in @  160 kB/s, out @  160 kB/s,  272 kB total, buffer   0% full18:16:33    300K   tank/send@issue-5027
in @  160 kB/s, out @  160 kB/s,  432 kB total, buffer   0% full18:16:34    429K   tank/send@issue-5027
in @  160 kB/s, out @  160 kB/s,  592 kB total, buffer   0% full18:16:35    685K   tank/send@issue-5027
in @  160 kB/s, out @  160 kB/s,  752 kB total, buffer   0% full18:16:36    813K   tank/send@issue-5027
in @  160 kB/s, out @  160 kB/s,  912 kB total, buffer   0% full18:16:37    942K   tank/send@issue-5027
in @  160 kB/s, out @  160 kB/s,  992 kB total, buffer   0% full
summary: 1070 kByte in  6.7 sec - average of  160 kB/s
received 1.05MB stream in 7 seconds (153KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @  177 kB/s, out @  177 kB/s,  124 kB total, buffer   0% full18:16:39    172K   tank/send@issue-5027
in @  176 kB/s, out @  176 kB/s,  304 kB total, buffer   0% full18:16:40    300K   tank/send@issue-5027
in @  176 kB/s, out @  176 kB/s,  484 kB total, buffer   0% full18:16:41    557K   tank/send@issue-5027
in @  176 kB/s, out @  176 kB/s,  664 kB total, buffer   0% full18:16:42    685K   tank/send@issue-5027
in @  176 kB/s, out @  176 kB/s,  844 kB total, buffer   0% full18:16:43    942K   tank/send@issue-5027
in @  176 kB/s, out @  176 kB/s, 1024 kB total, buffer   0% full
summary: 1070 kByte in  5.9 sec - average of  180 kB/s
received 1.05MB stream in 6 seconds (178KB/sec)
OK
full send of tank/send@issue-5027 estimated size is 1.01M
total estimated size is 1.01M
TIME        SENT   SNAPSHOT
receiving full stream of tank/send@issue-5027 into tank/recv@issue-5027
in @  200 kB/s, out @  200 kB/s,  140 kB total, buffer   0% full18:16:45    172K   tank/send@issue-5027
in @  200 kB/s, out @  200 kB/s,  340 kB total, buffer   0% full18:16:46    429K   tank/send@issue-5027
in @  200 kB/s, out @  200 kB/s,  540 kB total, buffer   0% full18:16:47    557K   tank/send@issue-5027
in @  200 kB/s, out @  200 kB/s,  740 kB total, buffer   0% full18:16:48    813K   tank/send@issue-5027
in @  200 kB/s, out @  200 kB/s,  940 kB total, buffer   0% full18:16:49    942K   tank/send@issue-5027
in @  200 kB/s, out @  200 kB/s, 1040 kB total, buffer   0% full
summary: 1070 kByte in  5.3 sec - average of  200 kB/s
received 1.05MB stream in 6 seconds (178KB/sec)
OK
loli10K commented 8 years ago

Versions tested:

kernel 4.7.2-hardened-gnu sys-kernel/spl-9999 from 05:19:18 2016-08-24 sys-fs/zfs-kmod-9999 from 05:25:13 2016-08-24 sys-fs/zfs-9999 from 21:56:57 2016-08-23 as well as 05:32:49 2016-08-26

@haasn i'm not very familiar with gentoo packaging, can you verify if your zfs version includes commit https://github.com/zfsonlinux/zfs/commit/e6d3a843d6ced970cbc74a3f809d744c30a7ec7c (OpenZFS 6393 - zfs receive a full send as a clone)?

On Illumos that change broke zfs recv when the send stream contains the flag DRR_FLAG_FREERECORDS (https://www.illumos.org/issues/6536). I'm not entirely sure this issue is caused by a conflicting flag, as they describe:

It turns out we added a flag with the same value and now we have a conflict. We will change our value to something else, in the meantime we need a way to generate streams that don't include DRR_FLAG_FREERECORDS.

but while debugging this issue on ZoL i noticed that when this problem occurs we are always in receive_freeobjects().

Actually the error message "cannot receive new filesystem stream: dataset does not exist" seems to be caused by dmu_object_next() returning ENOENT while we are iterating over freerecords in receive_freeobjects():

Breakpoint 2, dmu_object_next (os=0xfffffffffffffee0, objectp=0x4 <irq_stack_union+4>, hole=B_FALSE, txg=2) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_object.c:251
251             return (SET_ERROR(error));
(gdb) bt
#0  dmu_object_next (os=0xfffffffffffffee0, objectp=0x4 <irq_stack_union+4>, hole=B_FALSE, txg=2) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_object.c:251
#1  0xffffffffc07380fa in receive_freeobjects (rwa=0xfffffffffffffee0, drrfo=0x4 <irq_stack_union+4>) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_send.c:2087
#2  0xffffffffc0738208 in receive_process_record (rrd=<optimized out>, rwa=<optimized out>) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_send.c:2663
#3  receive_writer_thread (arg=0xffff88003a098600) at /usr/src/zfs-e6d3a843d6ced970cbc74a3f809d744c30a7ec7c/module/zfs/dmu_send.c:2726
#4  0xffffffffc062e265 in thread_generic_wrapper (arg=0xffff8800339d97c0) at /usr/src/spl-spl-0.7.0-rc1/module/spl/spl-thread.c:61
#5  0xffffffff8109ac4f in kthread (_create=0xffff88003aa36940) at kernel/kthread.c:209
#6  0xffffffff815cc3f2 in ret_from_fork () at arch/x86/entry/entry_64.S:406
#7  0x0000000000000000 in ?? ()
(gdb) p error
$1 = 2
(gdb) 

dmu_object_next() was modified in https://github.com/zfsonlinux/zfs/commit/50c957f702ea6d08a634e42f73e8a49931dd8055 (_Implement largednode pool feature) but the issue manifests only since https://github.com/zfsonlinux/zfs/commit/e6d3a843d6ced970cbc74a3f809d744c30a7ec7c (OpenZFS 6393 - zfs receive a full send as a clone): this is because the return code ENOENT was not previously used as an error condition in the for/loop in receive_freeobjects(), whereas now it's used by the local variable next_err.

TL;DR

https://github.com/zfsonlinux/zfs/commit/50c957f702ea6d08a634e42f73e8a49931dd8055 makes dmu_object_next() returns ENOENT when receiving freeobjects. https://github.com/zfsonlinux/zfs/commit/e6d3a843d6ced970cbc74a3f809d744c30a7ec7c use the previous ENOENT as an error condition and report it to userspace when receiving freeobjects.

haasn commented 8 years ago

@haasn i'm not very familiar with gentoo packaging, can you verify if your zfs version includes commit e6d3a84 (OpenZFS 6393 - zfs receive a full send as a clone)?

The version I am currently testing with is the git tag zfs-0.7.0-rc1 (as well as spl-0.7.0-rc1). According to the webpage you linked, that tag includes the commitin question.

When I made this bug report, I was still on commit https://github.com/zfsonlinux/zfs/commit/9907cc1cc, which also includes the suspect commit.

loli10K commented 8 years ago

There are still some questions that need to be answered (why didn't the ZFS test suite catch this? why is this only happening sometimes? why is dmu_object_next() returing ENOENT?): until we get this figured out you can probably try to run a previous commit to avoid this bug and still access your large_dnode-enabled data.

haasn commented 8 years ago

@loli10K I gave e6d3a84^ a try (with spl on 0.7.0-rc1 still), but it just spammed my kmesg with null pointer dereference errors while trying to boot. Not sure what that was about, but I'm back on zfs-0.7.0-rc1 for now.

I can wait for a proper fix for this issue.

loli10K commented 8 years ago

@haasn wow, sorry about that, a slightly older commit shouldn't be doing that ...

I made more tests and i can almost certainly confirm this is a race condition: zfs recv works with large_dnode active as long as we can push all the FREEOBJECT records from the send stream before tgx_sync comes in and sets ds->ds_feature_inuse[SPA_FEATURE_LARGE_DNODE], from then on:

  1. dmu_object_next() starts executing new code inside the if condition https://github.com/zfsonlinux/zfs/blame/zfs-0.7.0-rc1/module/zfs/dmu_object.c#L244
  2. calls dmu_object_info()
  3. gets a ENOENT and reports it back to the caller
  4. the caller exists the for/loop in receive_freeobjects()
  5. zfs recv fails with "dataset does not exist"

The ZFS test suite probably doesn't catch this because executes the tests too fast to fail.

Unfortunately i don't know how to fix this properly, we'll have to wait for someone with more experience in the ZFS codebase to pick this up.

haasn commented 8 years ago

What's the status of this? Is there a fix in 0.7.0-rc2 or master that I could test? (Asking since it's part of the 0.7.0 milestone)

I don't really understand if this is an upstream issue (OpenZFS) or just a zfsonlinux issue.

loli10K commented 8 years ago

@haasn i'm positive this is a ZoL-only issue since the code hasn't been ported to upstream (yet): https://www.illumos.org/issues/7432.

angstymeat commented 7 years ago

Not sure if I'm having this exact same issue or not. I'm running the December 1st commits, b0319c1faae5a77e553ae74bf899efbc6629674d, on two machines. One is a Fedora 20 with kernel 3.19.8-100.fc20.x86_64 and the other is Fedora 23 with kernel 4.8.10-100.fc23.x86_64.

All features are enabled on the sender and receivers.

I'm sending snapshots of 4 filesystems: databases/mysql/bin-log, databases/mysql/data, databases/mysql/innodb, and databases/mysql/logs. All of them complete except for "databases/mysql/data" which dies about 1.5G into a 7G transfer with "cannot receive new filesystem stream: dataset does not exist".

Unlike haasn's problem, this is consistent. Scrubs of both pools show they are clean, but I cannot receive the filesystem on any machine that I send it to.

haasn commented 7 years ago

Unlike haasn's problem, this is consistent. Scrubs of both pools show they are clean, but I cannot receive the filesystem on any machine that I send it to.

My problem is consistent for large datasets too. Since it's some sort of race condition, it depends on the dataset size. 1.5 GB is definitely above the threshold where it would always fail.