cannot send raw incremental send after creation of snapshot at remote end

prometheanfire commented 5 years ago

System information

Type	Version/Name
Distribution Name	Gentoo
Distribution Version	master
Linux Kernel	5.0
Architecture	x86_64
ZFS Version	0.8-rc5
SPL Version	N/A

Server code for the recieve end is master as of April 13th, commit b92f5d9f8254f726298a6ab962719fc2b68350b1 I think.

Describe the problem you're observing

Encrypted incremental sends fail if the receiving end has a snapshot taken after the original send was done. (IV missmatch)

Describe how to reproduce the problem

This takes two systems

System 1:

zfs send -Lwecp zp00@backup-201905101104 | ssh REMOTE_IP zfs recv -uvs -o canmount=off zp01/remote-backups/slaanesh-zp00

System 2:

zfs snapshot zp01/remote-backups/slaanesh-zp00@testing123

System 1:

zfs send -Lwecp -I zp00@backup-201905101104 zp00@backup-201905171039 | ssh REMOTE_IP zfs recv -uvs -o canmount=off zp01/remote-backups/slaanesh-zp00

This will result in the following error.

cannot receive incremental stream: IV set guid mismatch.
See the 'zfs receive' man page section discussing the
limitations of raw encrypted send streams.

prometheanfire commented 5 years ago

Destroying the remote snapshot enables incremental sends again, but is kinda useless from a real life usage scenario. Is the issue that 'basic' snapshots on the remote end not copying the IV from the previous snapshot (if nothing has changed)?

prometheanfire commented 5 years ago

@tcaputi ^ may interest you

DeHackEd commented 5 years ago

While the error message may be slightly odd, this would fail without encryption anyway so I'm inclined to say Not A Bug.

prometheanfire commented 5 years ago

This has been working this way for the last few years, I'd say it is a bug to break incremental sends like this.

prometheanfire commented 5 years ago

In fact, it does work with unencrypted datasets

prometheanfire commented 5 years ago

so, broken

DeHackEd commented 5 years ago

Interesting. If the filesystem doesn't change at all (atime=off) this does work.

Apologies

prometheanfire commented 5 years ago

atime affects snapshots? I don't think I have I have it enabled anywhere though. Also, I think I set the canmount=no flag on every regular dataset (can't set that on volumes).

DeHackEd commented 5 years ago

No, but atime can result in dataset modification meaning the requirement that the snapshot not be modified to receive an incremental is violated. Raw sends tend to assume the receiver can't decrypt the dataset, and hence can't mount it or modify it.

prometheanfire commented 5 years ago

that's not the error I'm getting (modification). At least I don't think the error has to do with that cannot receive incremental stream: IV set guid mismatch. It has to do with the encryption check, not which zfs 'commit' or bookmark it is at.

prometheanfire commented 5 years ago

/me is sad that this was not in 0.8.0 (as it seems fairly core to zfs)

behlendorf commented 5 years ago

This is something which we're going to need to decide if it should be explicitly allowed. I don't think it's unreasonable, it's just a use case we didn't consider and it happened to accidentally work in previous versions. Given that, it wasn't something I felt should hold up the release. Could you explain your specified use case for this. Then we can look in to exactly what's going to be required to properly support this.

prometheanfire commented 5 years ago

explained in irc, but I'll copy it here

I encode info about the date/host that takes the snapshot into the snapshot name, not the cleanest thing to do I also do recursive snapshots on each system, pruning the snaps that I don't want before sending having the host in the snapshot name helps helps in determining the first or the last snap to use when send/recv that's more or less it, I can share the script I use, perhaps the script is just too simple http://dpaste.com/2MXKW1P I think if I changed it to check the last snap on the remote and send from that to the latest snap on the local that'd work

behlendorf commented 5 years ago

@prometheanfire after looking in to this we decided that while it's probably possible to support this for raw receives. It's also more complicated than it first appears and is not something we have the time to work on right now. @tcaputi has opened #8863 to fix the error message, and I'd suggest updating your scripts if you haven't already done so.

prometheanfire commented 5 years ago

Thanks for the heads up, ya I was going to update the scripts, need to figure out a way to get snapshot ordering I think I may have to switch to pyzfs (not sure if the command line client orders by snapshot time or snapshot name).

Is there a quick summary of the problem (just curious, no need to tell me if it'd take too long)?

tcaputi commented 5 years ago

Basically, the issue is just that doing the obvious fix caused a cascade of new errors where the code wasn't expecting this. As far as I can tell, this was never really intended to work this way, but the code just didn't properly check for it and happened to ignore that snapshot when doing the actual work.

prometheanfire commented 5 years ago

Is there a reason it can't just ignore 'no diff snapshots' when searching backwards and stop when it either finds an acceptable snap or if the snap starts to have changes? It seems like that was the previous behavior (from the outside).

tcaputi commented 5 years ago

That was the obvious fix and it didn't work unfortunately. It caused a bunch of issues in the code that swaps out a receive clone when the receive is finished. I didn't see an obvious way to fix this and had other encryption issues to get to. Also, this code is going to be completely refactored anyway with redacted send / receive, so it didn't seem worth it to look into this now.

prometheanfire commented 5 years ago

ah :(

prometheanfire commented 5 years ago

for completeness I worked around taking a snapshot locally via...

# get list of datasets to back up from destination (requires initialization for new datasets)
BACKUP_LIST=$(zfs list -o name -s name -H | grep "${BACKUP_POOL_NAME}" | sed -E -e "s:^${BACKUP_POOL_NAME}(/|$)::g" -e 's:^backups(/|$)::g' -e '/^$/d')
for DATASET in ${BACKUP_LIST}; do
  FIRST_SNAP=$(zfs list -t snap -o name -H "${BACKUP_POOL_NAME}/backups/${DATASET}" | tail -n 1 | cut -d@ -f 2)
  if $(zfs list -t snap -o name -H | grep "${SOURCE_POOL}/${DATASET}@" | grep -q "${FIRST_SNAP}"); then
    LAST_SNAP=$(zfs list -t snap -o name -H "${SOURCE_POOL}/${DATASET}" | tail -n 1 | cut -d@ -f 2)
  else
    echo "Can not find snapshot ${FIRST_SNAP} in ${SOURCE_POOL}/${DATASET}.  quiting"
    break
  fi
  if [[ "${FIRST_SNAP}" == "${LAST_SNAP}" ]]; then
    echo "# Looks like ${DATASET} is already backed up, skipping."
    continue
  fi
  SIZE=$(zfs send -LwecpnvP -I "${SOURCE_POOL}/${DATASET}@${FIRST_SNAP}" "${SOURCE_POOL}/${DATASET}@${LAST_SNAP}" | tail -n 1 | awk '{ print $2 }' 2> /dev/null)
  if [[ "$(zfs get -H -o value encryption ${SOURCE_POOL}/${DATASET})" != 'off' ]]; then
    RECV_SOPT='-s'
  else
    RECV_SOPT=''
  fi
  if [[ "$(zfs get -H -o value type ${SOURCE_POOL}/${DATASET})" == 'volume' ]]; then
    RECV_CANMOUNT=''
  else
    RECV_CANMOUNT='-o canmount=off'
  fi
  echo "zfs send -Lwecp -I ${SOURCE_POOL}/${DATASET}@${FIRST_SNAP} ${SOURCE_POOL}/${DATASET}@${LAST_SNAP} | pv -s "${SIZE}" | zfs recv -duv ${RECV_SOPT} ${RECV_CANMOUNT} ${BACKUP_POOL_NAME}/backups"
done

it's not perfect (doesn't check for intermediary snapshots) and assumes that the last snapshot in the list is the newest (always seemed to be the case, is this something I can rely upon?) but works.

Can I rely upon zfs list -t snap -o name showing the last snapshot for each dataset being the latest one?

AttilaFueloep commented 5 years ago

Can I rely upon zfs list -t snap -o name showing the last snapshot for each dataset being the latest one?

While I can't answer your question, I know a way to get the latest snap, just sort by creation time: zfs list -H -d 1 -t snap -o name -S creation <ds> | head -1. Replacing -S with -s will get the name of the oldest one. You could also sort by createtxg instead.

ipaqmaster commented 7 months ago

Also running into this issue wrapping my send/recvs with syncoid in an unusual-to-me snapshot use-case where I've been moving (send/recv'ing) a dataset between my desktop and laptop depending on which one I intend to work from over different days this week. After a few snaps back and forth I now face cannot receive incremental stream: IV set guid mismatch. See the 'zfs receive' man page section discussing the limitations of raw encrypted send streams..

It is only 13GB and this is a 1GBPS LAN they're on so I just zfs remove -r'd it on my desktop and sent it fresh from the laptop in a few minutes. But I expect this could happen again while I'm not on as fast a remote link to re-transmit it all.

At the moment both hosts are running zfs-2.2.2-1 though the desktop is on kernel 6.6.9 while the laptop is on 6.6.10, which I would not expect to be relevant right now.

openzfs / zfs