openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.63k stars 1.75k forks source link

zfs init scripts on gentoo cannot find 'find' and 'awk' #3457

Closed janlam7 closed 9 years ago

janlam7 commented 9 years ago

After updating to 2a34db1bdbcecf5019c4a59f2a44c92fe82010f2 and adding the zfs-import and zfs-mount scripts to the sysinit runlevel my system did boot correctly, but gave notices about

On my system find is in /usr/bin and awk is in both /bin and in /usr/bin which both are symlinks to gawk.

behlendorf commented 9 years ago

@FransUrbo when you get a minute could you look in to this.

FransUrbo commented 9 years ago

I've have, and I can't see it. it should be impossible…

The very, very first thing we do is to include the zfs-functions library. The very, very first thing in that file is to set PATH=/sbin:/bin:/usr/bin:/usr/sbin. But, ah… I might not be exporting it…

@janlam7 Could you edit the /etc/zfs/zfs-functions file and prefix the PATH=…. line with export and try again?

export PATH=/sbin:/bin:/usr/bin:/usr/sbin
janlam7 commented 9 years ago

@FransUrbo I updated zfs-functions but that did not help. After that I extracted the initramfs I'm using and found out that find and awk are not in there ;-) usr/bin is completely empty. I'll go through the genkernel options to see if I'm missing something obvious.

Having / on ext4 and /usr on my zpool probably doesn't help either.

FransUrbo commented 9 years ago

@janlam7 I'm fairly certain that the init script isn't used in the initrd. That's usually taken care of in a different way. But you mention genkernel which leads me to believe you're running Gentoo. If so, maybe @ryao might be able to shed some light on this?

janlam7 commented 9 years ago

So that sounds like a chicken and egg problem then. The zfs-init-scripts need files from /usr/bin and /usr/bin is on zfs and not available yet.

Bronek commented 9 years ago

@janlam7 it should be available inside initramfs though, or did I fail to understand when the problem occurs?

AndCycle commented 9 years ago

@Bronek genkernel is a Gentoo specific tool to ease kernel/initrd creation, unfortunately it doesn't include /usr/bin into initrd, which shouldn't require during that stage,

zfs related thing inside initrd is zfs module lib and mount.zfs, initrd should be able to switch from ramdisk to root zfs without init.d, then init the boot process,

I don't wanna bring down my prod system, so looks like we have to wait someone else to get into detail

AndCycle commented 9 years ago

my zfs is 0.6.4.1 here is some rough info on my system for reference,

http://pastebin.com/VAxHm52g initrd file list

http://pastebin.com/D7VNcxrK linuxrc

fling- commented 9 years ago

@Bronek genkernel's initramfs has everything it needs in busybox binary and the error happening after the switch_root so the problem is not caused by the initramfs. @FransUrbo The problem is in an initscript trying to use a binary from /usr before /usr mounted. For example I have the same error in /etc/init.d/bootmisc because openrc is running the script before /etc/init.d/zfs and bootmisc failing to read /usr/bin/find while /usr is not mounted yet. @janlam7 are you sure zfs initscript should go to sysinit runlevel? I use it in boot runlevel on all of my boxen.

FransUrbo commented 9 years ago

I've taken great care to make sure that zfs-import run as early as possible, but still late enough so it have all it needs. This is/might not be perfect.

BUT, on the other hand, you're trying to use them for something that they weren't designed for. They where never intended to run in an initrd. That doesn't mean that they can't be made to work there, but it will require YOU to modify your system/initrd/whatnot to make that work.

This is complicated even further of you using ZFS/ZoL in a way that isn't really supported. The core devels (ryao, behlendorf etc) have said that using ZFS/ZoL on the root fs isn't supported. This includes using !zfs on root and zfs on /usr, /var etc.

And I already know the objection to all this: "the previous scripts worked" :). Yes they did, but they where broken (trying to do the import, mount AND share at the exact same time - that will almost never work in real life). They where also almost impossible to maintain, because there where FIVE scripts that did almost exactly the same thing, with minute changes to work on a specific platform. This lead to not all functionality needed/wanted existed in all five.

This is the primary reasons why we needed new scripts. The current initrd scripts will most likely never be (re)worked to work with a fs setup like this (to difficult to programmatically do this correctly and safely - how do the script figure out WHICH filesystem to mount from !zfs and which from zfs in a way that will work for EVERYONE, not just you?) and these init scripts might not be possible to run on a limited fs like a initrd.

So I'm sorry, but because of the extremely specific local configuration and setup, you're pretty much on your own. I have no idea how to solve this or to help you.

What you can TRY is to make sure that genkernel copies find and awk in the initrd. But I don't know enough about Gentoo to help there.

fling- commented 9 years ago

@FransUrbo notice @janlam7 said in the first message the scripts are used in a runlevel and not in an initramfs

FransUrbo commented 9 years ago

Yes, but he said in the second one that he used it in the initrd.

But the problem is the same - he don't have find/awk because his /usr is on ZFS. This is why it won't be possible to import the pool in a runlevel. It must be done in the initrd. But the script won't work in the initrd (without some serious tweaking).

I see no possibility to support a setup like this.

ryao commented 9 years ago

@FransUrbo You could workaround this by detecting that find and awk are missing and switching to busybox find and busybox awk.

That said, mounting /usr with genkernel can be done via /etc/initramfs.mounts.

ryao commented 9 years ago

Also, I suppose that @janlam7 could workaround this by symlinking /bin/awk and /bin/find to /sbin/busybox. The default $PATH places /usr/bin before /bin, such that tools would switch to /usr/bin when /usr is mounted.

It would be a hack, but it would be an interesting way of dealing with this problem until the scripts have a workaround in place. This might confuse some configure scripts, so I don't recommend it for production.

janlam7 commented 9 years ago

I'm sorry I gave the impression I was using the initscripts from withing an initramfs. Figuring out the gentoo boot process is new to me. Now I understand that the initramfs mounts the root, and that the initscripts run from the root, after it has been mounted. I put the initscripts in the sysinit runlevel because the boot runlevel gave even more problems.

I tried mounting /usr (which is a zfs dataset) from initramfs.mounts but couldn't get it to work correctly, I'll have another go at it later.

FransUrbo commented 9 years ago

Using busybox sounded like a good idea, until I noticed that CentOS don't have that installed [by default].

But I like to iterate my opinion that this is such a specialized setup, that I can't support this. In addition to this, you either go all-in (except /boot) for ZFS on your system, or you have to customize any init/initrd scripts yourself.

ryao commented 9 years ago

@FransUrbo The way to use busybox here would be to have ${AWK} and ${FIND} variables that are set to busybox awk/busybox find if either of those commands are missing and busybox is available.

ryao commented 9 years ago

@janlam7 You would need to set /usr to legacy and put into fstab to use /etc/initramfs.mounts.

FransUrbo commented 9 years ago

@ryao yeah I got that. But busybox doesn't necessarily exist on all systems, so it won't help (much).

ryao commented 9 years ago

@FransUrbo Dynamically falling back to busybox would fix the regression for the Gentoo users that are affected and could be either made a dependency on distributions that support this configuration or installed by users that want this configuration on distributions that dropped support for it.

FransUrbo commented 9 years ago

Have a look at https://github.com/FransUrbo/zfs/commit/146865f1900cb9826fb5e09a63082a42ebb01e2f#diff-2ec95e0040f8e076f444b03b9058bdf1R90. Should do it (for those that DO have busybox).

ryao commented 9 years ago

I just looked at how we use find and awk in the scripts. I think I can eliminate our use of them, but I will not have time to do it until tonight (at which point, I might be too worn out).

We could eliminate find with a POSIX shell loop and a test (e.g. [ -d "${var}" ]) for something being a directory (although this might not be necessary). We could replace awk with a combination of sed and grep. That would avoid touching anything in /usr.

ryao commented 9 years ago

Something like [ -z "$(sed -e '/\(.*\) \/ \(.*\)/!d' /etc/mtab)" ] could replace awk in the script. The regex there isn't quite right because it includes whitespace and therefore we are not just matching the second column, but that is the general idea.

ryao commented 9 years ago

Another option is to do a double loop with a counter. Have the outer loop read lines and the inner loop be something like for x in ${LINE} ....

FransUrbo commented 9 years ago

My sed-fu is even worse than yours. However:

centos60:/usr/src# [ -z "$(sed -e '/\(.*\) \/ \(.*\)/!d' /etc/mtab)" ]
bash: !d': event not found
centos60:/usr/src# awk '$2 = "/" { exit 1 }' /etc/mtab
centos60:/usr/src# 
ryao commented 9 years ago

@FransUrbo sed -e '/\(.*\) \/ \(.*\)/!d' /etc/mtab works. The issue is that I wrapped that inside double quotes, which interpreted the characters differently. It needs some fiddling. I just meant to post the general idea.

FransUrbo commented 9 years ago

Ok.

FransUrbo commented 9 years ago

@ryao @janlam7 How about this - https://github.com/FransUrbo/zfs/commit/5e63da9c4177f77280ce5b4826bc7d414ad0dc8b#diff-6169c2463ff29c1dcea2e9d22152025cR33 ?

ryao commented 9 years ago

set -- $(echo "$line") is a bashism. It isn't POSIX compliant. The POSIX way to do this is to do a for loop.

FransUrbo commented 9 years ago

I've never heard that set -- shouldn't be POSIX! How would you get the second column in a string otherwise?

FransUrbo commented 9 years ago

But I got rid of find as well in the latest push.

FransUrbo commented 9 years ago

set -- $line seems to be working on both sh, ksh and openrc-run, so I think that will do.

ryao commented 9 years ago

Usually, breaking a string into $1, $2, etcetera is supposed to be non-POSIX. If it is portable, that is probably fine then.

ryao commented 9 years ago

@FransUrbo One thing that occurs to me is that Gentoo supports multiple shell interpreters by making /bin/sh a symlink configurable by eselect (when app-eselect/eselect-sh is installed). bash will try to be POSIX compliant when execved with the name sh (as is the case for a symlink), but bash might not necessarily catch all of its extensions. We also have the option of using dash as /bin/sh on Gentoo. You should emerge dash and test with /bin/sh pointing to dash to be certain that this is portable.

janlam7 commented 9 years ago

@FransUrbo : thanks, FransUrbo@146865f and FransUrbo@5e63da9 work for me, no more messages about find and awk.

There is only a 'sed: -e expression #1, char 0: no previous regular expression' left without a line number, which I apparently overlooked. It was there in the original version too, I guess in zfs-import. Below is part of the original rc.log, but I can't find a sed -e in the original commit anywhere.

Waiting for uevents to be processed ... [ ok ] /etc/init.d/zfs-import: line 145: find: command not found Importing ZFS pool pool2 ... [ ok ] sed: -e expression #1, char 0: no previous regular expression /etc/init.d/zfs-import: line 145: find: command not found /etc/init.d/zfs-mount: line 169: awk: command not found Mounting ZFS filesystem(s) ... [ ok ]

FransUrbo commented 9 years ago

Because you need your pool for your OS, that makes it difficult to find which sed is the problem.

The Gentoo init scripts have a -d option (as in /etc/init.d/zfs-import -d start for example) to debug the script (same as using set -x in a bash/sh script).

If you could somehow stop the boot procedure just before zfs-import is run and run it manually with the debug switch, that would help a lot. It might also be possible to set a variable or toggle somewhere to indicate that this should be done on the next invocation of the script.

@ryao Any pointers to do this?

janlam7 commented 9 years ago

The -d option gives the impression it is at line 93 in zfs-import npools=$(echo "$npools" | sed "s,$available_pools,,")

I can put the full log somewhere, but below is a snippet npools=pool2 '[' -n pool2 ']' USE_DISK_BY_ID=yes echo pool2 sed s,,, sed: -e expression #1, char 0: no previous regular expression npools= '[' -n '' ']' available_pools=

FransUrbo commented 9 years ago

npools=$(echo "$npools" | sed "s,$available_pools,,")

Ah, yes. if $available_pools is NULL, that would lead to a sed error…

I think I have a fix that should work, but I need to do some tests to be sure. I'll let you know when I have something for you to test.

FransUrbo commented 9 years ago

@janlam7 Ok, have a look at the https://github.com/FransUrbo/zfs/commit/c31e7ffbe6eff6d0644d2874b15cc39f7c6d40e7#diff-af0fd9d5059ecf07e6f2983dd83a418cL88 diff part (or take the whole file :). That should do it.

janlam7 commented 9 years ago

@FransUrbo thanks, https://github.com/FransUrbo/zfs/commit/c31e7ffbe6eff6d0644d2874b15cc39f7c6d40e7#diff-af0fd9d5059ecf07e6f2983dd83a418cL88 works for me.

FransUrbo commented 9 years ago

Good to hear, thanx for helping testing the solutions.

@behlendorf I think the fix is ready to be merged at your discretion.