Closed cfellinger closed 8 years ago
Partly success: the IFS problem can be mittigated with the following:
for x in $(IFS= "{ZFS}" list -H -oname -tsnapshot -r "${rootfs} | \
grep "@${snapname}$")
now snapshots will be properly cloned and boot succeeds. Eilas, booting from nested bootfs datasets still doesn't work.
My time is quite limited, but I'll try to work on this this weekend. If anyone have a patch, I'll be a much happier camper :).
you want patch files? I've attached the two I need to get it working.
zfs-functions--patch.txt initramfs-tools_scripts_zfs--patch.txt
I'll be away most of saterday during the day, but otherwise I'll be available for testing
You mentioned that it doesn't work with nested datasets.
yes, things like:
mounted datasets:
NAME CANMOUNT MOUNTPOINT MOUNTED
rpool/ROOT/system noauto /target yes
rpool/ROOT/system/var on /target/var yes
rpool/ROOT/system/var/log on /target/var/log yes
this will boot properly, but not from snapshot, most likely I'm doing something unsupported, the snapshots look like:
$ fs get mountpoint rpool -r
NAME PROPERTY VALUE SOURCE
rpool mountpoint none local
rpool/ROOT mountpoint none inherited from rpool
rpool/ROOT/system mountpoint /target local
rpool/ROOT/system@ONE mountpoint - -
rpool/ROOT/system@TWO mountpoint - -
rpool/ROOT/system/var mountpoint /target/var inherited from rpool/ROOT/system
rpool/ROOT/system/var@ONE mountpoint - -
rpool/ROOT/system/var@TWO mountpoint - -
rpool/ROOT/system/var/log mountpoint /target/var/log inherited from rpool/ROOT/system
rpool/ROOT/system/var/log@ONE mountpoint - -
rpool/ROOT/system/var/log@TWO mountpoint - -
rpool/SWAP mountpoint - -
thee boot stops with the following error message
Command: mount_fs rpool/ROOT/system_ONE
Message:
Error: 1
Error: Failed to mount root filesystem 'rpool/ROOT/system_ONE'.
..BusyBox v1.22...
and zfs list shows no clones, not even for the root. I wil try echo statements to see where it looses track
As an aside: booting from unknown snapshot or from a mere @ doesn't work either, I get an empty list to choose from though I've two snapshots
Then that needs to be fixed to. Do you have a patch for that to?
ah that one (asking what snapshot to use) proved to be easy, once I looked at the proper place that is :) just add "-r" in ask_user_snap() to get them all
--- /etc/zfs/zfs-functions 2016-04-06 17:52:08.000000000 +0200
+++ /target/etc/zfs/zfs-functions 2016-04-16 01:53:46.309200222 +0200
@@ -1026,7 +1026,7 @@
eval `echo SNAP_${i}="${snap}"`
i="$((i + 1))"
done <<EOT
-$("${ZFS}" list -H -oname -tsnapshot "${fs}")
+$("${ZFS}" list -H -oname -tsnapshot -r "${fs}")
EOT
echo -n " Snap nr [0-$((i-1))]? " > /dev/stderr
for your convenience, added here as a patch file ask-snap-patch.txt
The problem with nested datasets keeps bugging me, I've tried a lot today, it seems to be related to IFS magic, so I'm at lost here :(
I consider this fixed in Wheezy, v0.6.5.6-4-wheezy
and Jessie, v0.6.5.6-4
Thanks for looking into it, but eilas it still doesn't work, not over here atleast. What does work is mounting the sub datasets. But the snapshot part is cut off no cloning preformed and the base version is mounted.
Unfortunately your patches were quite different from mine so I've to redo the testing
Did you make sure to install the new version of /etc/zfs/zfs-functions
?
yes, ofcourse, but you ommitted part of the proposed patches :) the one in /usr/share/initramfs-toos/scripts/zfs and so the ZFS_BOOTFS from the commandline gets bluntly overruled
--- /usr/share/initramfs-tools/scripts/zfs 2016-04-17 16:15:31.000000000 +0200
+++ /target/usr/share/initramfs-tools/scripts/zfs 2016-04-17 18:23:35.217403997 +0200
@@ -277,6 +277,7 @@
if [ -n "${ZFS_RPOOL}" -a -z "${POOL_IMPORTED}" ]
then
import_pool "${ZFS_RPOOL}" && \
+ [ -z "${ZFS_BOOTFS}" ] && \
rfs="$(find_rootfs "${ZFS_RPOOL}")" && \
ZFS_BOOTFS="${rfs}"
atleast now I get to boot from snapshot AND the sub datasets get properly cloned AND mounted
When running in zfs_debug mode and no quiet on the commandline, I get faiiled messages, like:
Begind: CMD: '' ... [...SPL..using hostid...]
done.
Begin: CMD: '' ... Failure: 2
Begin: CMD: '' ... Failure: 1
Begin: CMD: '' ... Failure: 1
Begin: CMD: '' ... Failure: 1
Begin: CMD: '' ... done.
Begin: CMD: '' ... done.
Begin: CMD: '' ... done.
DEBUG: imported pools
import_pool "${ZFS_RPOOL}" && \
- [ -z "${ZFS_BOOTFS}" ] && \ rfs="$(find_rootfs "${ZFS_RPOOL}")" && \ ZFS_BOOTFS="${rfs}" Ok, I can see the point of doing that. However, I need to ponder the repercussions of this. atleast now I get to boot from snapshot AND the sub datasets get properly cloned AND mounted
So what's the problem? When running in zfs_debug mode and no quiet on the commandline, I get faiiled messages, like:
Begind: CMD: '' ... [...SPL..using hostid...] done. Begin: CMD: '' ... Failure: 2 Begin: CMD: '' ... Failure: 1 Begin: CMD: '' ... Failure: 1 Begin: CMD: '' ... Failure: 1 Begin: CMD: '' ... done. Begin: CMD: '' ... done. Begin: CMD: '' ... done. DEBUG: imported pools I'm going to need more information, the full output.
Begin: CMD: '' ... Failure: 2 These lines also look suspicious. I can find NO reference to the > ' < chars anywhere in the code! Nor do I know where the 'Failure: [12]' come from.
From what you've describing, I can only conclude that you're using old code somewhere! You need to go through everything again and make absolutly sure that you're using my code! And ONLY that!
The files that should be used in this are:
* from the zfsutils package
/etc/zfs/zfs-functions
* from the zfs-initramfs package
/usr/share/initramfs-tools/conf-hooks.d/zfs
/usr/share/initramfs-tools/hooks/zfs
/usr/share/initramfs-tools/scripts/zfs
i'm quiet sure I'm using your things, but I've checked with md5sums
root@rescue ~ # apt-get download zfsutils
Get:1 http://archive.zfsonlinux.org/debian/ jessie/main zfsutils amd64 0.6.5.6-4 [385 kB]
Fetched 385 kB in 0s (529 kB/s)
root@rescue ~ # apt-get download zfs-initramfs
Get:1 http://archive.zfsonlinux.org/debian/ jessie/main zfs-initramfs amd64 0.6.5.6-4 [23.1 kB]
Fetched 23.1 kB in 0s (68.2 kB/s)
extracted the relevant files and run md5sums, and they're all equal
on the comment-line-overrule problem, it's a bit tricky for me to get full error report as i'm testing in a qemu-system session and i don't know how to cut and paste from there. I' could upload screen shots if that's okee
extracted the relevant files and run md5sums, and they're all equal
You need to trace the build of a initrd. Because it sure looks like a file NOT part of the package (or an old version some how) is included instead!
Because I can't reproduce this and from what you describe of the output, something looks "fishy".
You need to trace the build of a initrd.
.. and make sure that initrd is actually used!
Well I'm testing in a freshly dbootstrapped machine, so I don't see what can go wrong :( What do you want me to do, "trace the build of initrd" doesn't ring a bell, but if I change lines (add debug lines) to those files, I sure get the expected result when I start the machine. Maybe you overlooked the part where I state "running in zfs_debug mode and no quiet on the commandline", both are needed to get the above failure codes.
You wanted extra input on "ZFS_BOOTFS from the commandline gets bluntly overruled"? here is a screenshot:
Now I see it, thanx:
--- /etc/zfs/zfs-functions~ 2016-04-17 14:47:00.048097033 +0100
+++ /etc/zfs/zfs-functions 2016-04-17 20:35:05.638951706 +0100
@@ -162,7 +163,7 @@
check_boolean "${quiet}" && \
zfs_log_begin_msg "${MSG} "
check_zfs_debug && \
- zfs_log_begin_msg "CMD: '${cmd}'"
+ zfs_log_begin_msg "CMD: '${CMD}'"
zfs_run_cmd "${CMD}"
Btw, when booting with zfsdebug=1
, then when the last question comes up (continue boot or spawn a shell) you can retrieve a full log from /var/log/zfs-boot.debug
.
It occurred to me that those failure lines only show up the second time I boot from the same snapshot, so at destroy time, and low and behold, when I applied your patch (cmd => CMD) it tells so:
Yeah, I saw the destroy problem to. Fixed with
--- /etc/zfs/zfs-functions~ 2016-04-17 20:35:05.638951706 +0100
+++ /etc/zfs/zfs-functions 2016-04-17 20:49:18.226373449 +0100
@@ -1084,9 +1084,10 @@
# If the destination dataset for the clone
# already exists, destroy it. Recursivly
if [ $(get_fs_value "${rootfs}_${snapname}" type) ]; then
- filesystems="$("${ZFS}" list -oname -tfilesystem -H \
- -r -Sname "${ZFS_BOOTFS}")"
- for fs in ${filesystems}; do
+ "${ZFS}" list -oname -tfilesystem -H -r -Sname \
+ "${ZFS_BOOTFS}" | \
+ while read fs
+ do
destroy_fs "${fs}"
done
fi
I'm building new packages as we speak, with both these fixes.
Ah, yet again an IFS problem. For some strange reason it gets set to a single space (why not " \t\n" ?) early on, and that gets to be the deafult setting which doesn't work out to well for such FOR-loops.
The fix for the comment-line-overrule problem is still postponed? It seems an essential fix to get booting from snapshot working, here at my site that is.
No, that's coming in this version as well. It makes sense - "kernel command line options is always right".
great, thanks a lot. That leaves me with just some nitpicking.
As I stated in the original issue report the selection of snapshots to clone is a little brittle, it simply matches to much. Better would be to anchor the grep like:
grep "@${snapname}$"
otherwise both dataset@long-snap-name and dataset@snap will match when requesting @snap, or so it seems to me from looking at the code. If you want to I can do some testing after the latest and greatest package gets installed here
Seems reasonable. Please do test. Create a new issue for that though, this is getting a little to long :)
I've run my test: boot from snapshot works as advertized, really awesome. for overlapping snapshot names problem I refer to issue #206
Hai, first thanks for all the work to get ZFS on linux, and in particular in Debian.
Unfortunately I can't get booting from snapshot working. After inspecting the code [from zfs-linux (0.6.5.6-3) jessie] I think I spotted some errors, but still no success.
in /usr/share/initramfs/scripts/zfs in line 281 ZFS_BOOTFS gets bluntly overwritten with the bootfs property from rpool. I thought the kernel command line was to overrule this property. So probably something like the following would do:
And then in /etc/zfs/zfs-functions in line 1084/85 what seems like good code doesn't do what's expected. The list of snapshots sticks together, so a multiline value is used as snapshot name. Probably IFS got set somewhere. But here me knowledge of bash lacks.
As an aside, as coded now the selection of names matching snapname is a little brittle, it wil match if the name itself, and not merely the snapshot part, matches. More robust seems to restrict the match to purely the snapshot extension like: