openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.45k stars 1.73k forks source link

0.6.2 Centos 6.2 Failed to load ZFS module stack #1766

Closed byteharmony closed 10 years ago

byteharmony commented 10 years ago

I've tried removing all but the current kernel, complete removal and install via RPM with yum. No errors at all on build. SPL does load (lsmod shows it). but ZFS gives the error:

[root@nas700 ~]# zfs list Failed to load ZFS module stack. Load the module manually by running 'insmod /zfs.ko' as root. Failed to load ZFS module stack. Load the module manually by running 'insmod /zfs.ko' as root. [root@nas700 ~]#

Machine had been running zfs rc14 before but I built it on a devel box and moved the RPMS over, no dkms.

Tried # yum remove -y dkms zfs spl then # yum install zfs

No good, still get spl but zfs is missing??

trying a second physical server, same as the first. expecting the same results. Process did work on my devel server so I believe a package dependency that is on a devel machine must be missing.

Any ideas?

Thanks, BK

byteharmony commented 10 years ago

Second machine, crash and burn: [root@nasxxx ~]# dkms status spl, 0.6.2, 2.6.32-358.18.1.el6.x86_64, x86_64: installed zfs, 0.6.2, 2.6.32-358.18.1.el6.x86_64, x86_64: installed zfs, 0.6.2, 2.6.32-279.14.1.el6.x86_64, x86_64: installed-weak from 2.6.32-358.18.1.el6.x86_64 spl, 0.6.2, 2.6.32-279.22.1.el6.x86_64, x86_64: installed-weak from 2.6.32-358.18.1.el6.x86_64 zfs, 0.6.2, 2.6.32-279.22.1.el6.x86_64, x86_64: installed-weak from 2.6.32-358.18.1.el6.x86_64 zfs, 0.6.2, 2.6.32-279.5.1.el6.x86_64, x86_64: installed-weak from 2.6.32-358.18.1.el6.x86_64 zfs, 0.6.2, 2.6.32-279.5.2.el6.x86_64, x86_64: installed-weak from 2.6.32-358.18.1.el6.x86_64 [root@nasxxx ~]# zfs list Failed to load ZFS module stack. Load the module manually by running 'insmod /zfs.ko' as root. Failed to load ZFS module stack. Load the module manually by running 'insmod /zfs.ko' as root. [root@nasxxx ~]#

byteharmony commented 10 years ago

intersting that the problem does not present it self on the kvm based devel VM, but on the Lenovo TS130 server and DELL Poweredge 2950 physical server both prior to this production servers, now dead systems.

I wonder, is there any way to clone a working DKMS system kernel / module to avoid a production system Russian rullet on upgrade?

BK

ryao commented 10 years ago

You could try literally copying the files from one system to another. The kernel modules live in /lib/modules/<kernel version>/extra.

byteharmony commented 10 years ago

Well some interesting results:

  1. Copy from devel box to beta box did nothing.
  2. Checking dmesg after the module should have loaded did: ... zfs: Unknown parameter zfs_scrub_limit' zfs: Unknown parameterzfs_scrub_limit' ...
  3. Remove that from /etc/modprobe.d/zfs.conf and poof zfs list works.

Now confirmed on 3 machines, so far soo good.

  1. Tried to just copy the kernel modules to a working machine with rc14 binaries on it. Failed. This is a bit concerning in that on my production machines I'd prefer not to have a compiler and libs.

Just updating the kernel and installing ZFS with the compile did work (Box 4 working). But I don't have a process that will let me use the kernel ZFS without needing to install a compiler :(

Any ideas on getting around that?

BK

sknolin commented 10 years ago

I had I believe the same issue taking a machine with 0.61 and a lustre patched kernel to 0.61 and a patchless kernel version of lustre - so no upgrade of ZFS, just trying to remove everything and install. Ours is a test system so after a day of spinning our wheels we simply re-kickstarted and installed fresh. So we didn't get much information on the problem.

I suspect the issue is some module files are left behind on removal - I suggest once again trying to remove everything, then do a find on likely filenames for the modules and see what turns up. My coworker did that and found something, but we did not document.

byteharmony commented 10 years ago

I don't know that that would apply to the first test case, the one where I simply take the working modules from a devel machine and copy them to the correct directory on a machine that has zfs installed but no kernel modules built by DKMS?

It would be nice to simply take the few (2 or 3) kernel modules and copy them rather than installing a compiler, headers, libraries, etc. to build on every single production server.

I think DKMS is great for devel and I'm excited to continue using it to upgrade my devel machine, but for production just coping binary files is nice and clean.

BK

behlendorf commented 10 years ago

@byteharmony The zfs_scrub_limit module option was removed which is what was causing the failure. Since you got to the root cause on this can we close the issue? In the future we should make sure to add comments about modules parameters which get removed to the release notes.

byteharmony commented 10 years ago

I have working solution, no need to keep it open. If anyone finds a way to run the modules without the dkms compile process please let me know :).

BK