zfsonlinux / pkg-zfs

Native ZFS packaging for Debian and Ubuntu
https://launchpad.net/~zfs-native/+archive/daily
308 stars 55 forks source link

0.6.5.3 DKMS fails ({spl,zfs}-dkms version mismatch) #184

Closed roeme closed 8 years ago

roeme commented 8 years ago

Hi,

after upgrading my system's kernel from 4.2.3 to 4.2.5, to ZFS module was no longer loaded (mismatching symbols). Trying to rebuild the modules yielded strange errors:

# /usr/lib/dkms/dkms_autoinstaller start
[....] dkms: running auto installation service for kernel 4.2.0-1-amd64:
< output truncated ✂ >
checking kernel file name for module symbols... Module.symvers
checking spl source directory... /usr/src/spl-
configure: error: 
    *** Please make sure the kmod spl devel ...

(Note the missing version number for the spl source directory). The spl-dkms module was sucessfully installed, however.

Further investigation reveals a version mismatch of the interesting sort:

# apt-cache policy zfs-dkms
zfs-dkms:
  Installed: 0.6.5.2-2
  Candidate: 0.6.5.2-2
  Version table:
 *** 0.6.5.2-2 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status

# apt-cache policy spl-dkms
spl-dkms:
  Installed: 0.6.5.3-1
  Candidate: 0.6.5.3-1
  Version table:
 *** 0.6.5.3-1 0
        750 http://mirror.switch.ch/ftp/mirror/debian/ testing/main amd64 Packages
        750 http://ftp.ch.debian.org/debian/ testing/main amd64 Packages
        100 /var/lib/dpkg/status
     0.6.5-1 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages

So spl-dkms has hit the official servers with 0.6.5.3 (at least in debian's testing), but remains at 0.6.5-1 on zfsonlinux’ own repo.

Downgrading spl-dkms to 0.6.5-1 fixes the issue, and allows one to build/load version 0.6.5.2 of zfs-dkms.

aktau commented 8 years ago

I noticed the same thing happening. The logic for choosing a "compatible" version of spl-dkms can be found in /var/lib/dkms/zfs/0.6.5.2/build/dkms.conf (gist of my local version).

The relevant part is this:

  --with-spl=${source_tree}/spl-$(subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
        if [ -e ${source_tree}/spl-${PACKAGE_VERSION} ]; then
           echo ${PACKAGE_VERSION}
        elif [ -e ${source_tree}/spl-${subver} ]; then
           echo ${subver}
        fi)
  --with-spl-obj=${dkms_tree}/spl/$(subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
        if [ -e ${source_tree}/spl-${PACKAGE_VERSION} ]; then
           echo ${PACKAGE_VERSION}
        elif [ -e ${source_tree}/spl-${subver} ]; then
           echo ${subver}
        fi)/${kernelver}/${arch}

Naturally, this code on my system yields the following value:

BAD spl:     /usr/src/spl-
BAD spl-obj: /var/lib/dkms/spl//4.2.0-1-amd64/x86_64

On which the build process chokes.

As can be seen, it tries to find either the same PACKAGE_VERSION of dkms-spl as the dkms-zfs it is trying to build, in this case 0.6.5.2. Failing that, it tries to find dkms-spl-0.6.5 (that's the subver). Which is also not installed on my system.

It seems to me line spl and zol are using semantic versioning, which means that the latest in the same minor version bracket 0.6.5.* should work for any zfs 0.6.5.*. So the situation for Debian Stretch and possibly ubuntu could be improved with something like this:

spl_new=${source_tree}/spl-$(
         subver=$(echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@")
         subver_max=$(cd ${source_tree}; for max in spl-${subver}.?; do :; done; echo "${max#spl-}")
          if [ -e ${source_tree}/spl-${PACKAGE_VERSION} ]; then
         echo ${PACKAGE_VERSION}
          elif [ -e ${source_tree}/spl-${subver} ]; then
         echo ${subver}
          elif [ -e ${source_tree}/spl-${subver_max} ]; then
         echo ${subver_max}
          fi)
spl_obj_new=${dkms_tree}/spl/$(
         subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
         subver_max=$(cd ${source_tree}; for max in spl-${subver}.?; do :; done; echo "${max#spl-}")
          if [ -e ${source_tree}/spl-${PACKAGE_VERSION} ]; then
         echo ${PACKAGE_VERSION}
          elif [ -e ${source_tree}/spl-${subver} ]; then
         echo ${subver}
         elif [ -e ${source_tree}/spl-${subver_max} ]; then
        echo ${subver_max}
          fi)/${kernelver}/${arch}

Which yields

NEW spl:     /usr/src/spl-0.6.5.3
NEW spl-obj: /var/lib/dkms/spl/0.6.5.3/4.2.0-1-amd64/x86_64

Which is better, IMHO.

The salient part is the subver_max, which finds the maximum version dkms-spl inside of the same semantic minor version.

@FransUrbo what do you think?

aktau commented 8 years ago

Sadly though, looking for this decision code so I could alter it (or send a PR), I can't find it in this repository. The closes I get is https://github.com/zfsonlinux/pkg-zfs/blob/snapshot/debian/jessie/debian/zfs-dkms.dkms.

Strangely, although it looks a whole lot like the dkms.conf I found on my system, it's missing the spl version selection parts:

  --with-spl=${source_tree}/spl-${PACKAGE_VERSION}
  --with-spl-obj=${dkms_tree}/spl/${PACKAGE_VERSION}/${kernelver}/${arch}

Is this some dkms magic? I'm confused.

aktau commented 8 years ago

I just said that I couldn't find the subver string in the snapshot/debian/jessie branch. So I used git grep to try my luck:

$ git grep 'subver=.echo ${PACKAGE_VERSION}' $(git rev-list --all)
61d75b3a64f0b6add6f64096bdd4d7e4fb66e61a:debian/zfs-dkms.dkms:  --with-spl=${source_tree}/spl-$(subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
61d75b3a64f0b6add6f64096bdd4d7e4fb66e61a:debian/zfs-dkms.dkms:  --with-spl-obj=${dkms_tree}/spl/$(subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
7c033dab0f33eff18825f09cf6ad1e9a153eb7e1:debian/zfs-dkms.dkms:  --with-spl=${source_tree}/spl-$(subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
7c033dab0f33eff18825f09cf6ad1e9a153eb7e1:debian/zfs-dkms.dkms:  --with-spl-obj=${dkms_tree}/spl/$(subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
b64b808b2226a148bfb3882af87b7c36def50008:debian/zfs-dkms.dkms:  --with-spl=${source_tree}/spl-$(subver=`echo ${PACKAGE_VERSION} | sed "s@\([0-9]\.[0-9]\.[0-9]\)\(\.[0-9]\)@\1@"`
...

So it would seem that the current zfs-dkms package was not built from the snapshot/debian/jessie branch as I originally thought, but from something else. However these raw hashes don't give me a good idea of what I should make the PR too, if any.

Also, this seems related to #181.

aktau commented 8 years ago

Possibly last update to my last comment, trying to find where this subver thing is coming from:

$ git log -S 'subver=' --source --all
commit 8edf755b2e19177aa664e331760e02277c4227e3 refs/tags/master/debian/wheezy/0.6.5.2-1-wheezy
Author: Turbo Fredriksson <turbo@bayour.com>
Date:   Wed Sep 30 23:07:35 2015 +0200

    Debian dir from 0.6.5.1-5-wheezy.

commit 3d271154eb46b36f5ec72be07baef9c03d784937 refs/tags/master/debian/jessie/0.6.5.2-1
Author: Turbo Fredriksson <turbo@bayour.com>
Date:   Wed Sep 30 21:26:02 2015 +0200

    Debian dir from 0.6.5.1-5

commit 7f06351c87a9add3fd1362416b87b7e1d770a7aa
Author: Turbo Fredriksson <turbo@bayour.com>
Date:   Sun Sep 20 12:47:10 2015 +0200

    Fix dkms build issue.

    * Check first for .../spl-[pkg-ver], THEN .../spl-[sub-ver]
      where pkg-ver=0.6.5.1 and sub-ver=0.6.5.

    Closes: zfsonlinux/zfs#3807

commit 3ef747d4de7d8344eef2e5426df96705fd237885
Author: Turbo Fredriksson <turbo@bayour.com>
Date:   Sun Sep 20 12:50:46 2015 +0200

    Fix dkms build issue.

    * Check first for .../spl-[pkg-ver], THEN .../spl-[sub-ver]
      where pkg-ver=0.6.5.1 and sub-ver=0.6.5.

    Closes: zfsonlinux/zfs#3807

So, a fix for zfsonlinux/zfs#3807 introduced it.

roeme commented 8 years ago

@aktau excellent investigation. Re: your proposed improved code, I've got one question:

subver_max=$(cd ${source_tree}; for max in spl-${subver}.?; do :; done; echo "${max#spl-}")

Is the glob spl-${subver}.? guaranteed to be ordered alphanumerically? In bash it should be (unless there is some outlandish $LC_COLLATE, but that would break all sort of things), however, is this guaranteed to be executed under bash?

Re: PR; I think the branch master/debian/jessie is missing here on github.

mailinglists35 commented 8 years ago

does anyone know a quick fix, please? ubuntu precise updated cleanly to 0.6.5.3 while debian wheezy and jessie is broken/outdated

root@homerouter:/usr/local/src/zfs# apt-cache policy spl-dkms zfs-dkms
spl-dkms:
  Installed: (none)
  Candidate: 0.6.5-1
  Version table:
     0.6.5-1 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
zfs-dkms:
  Installed: (none)
  Candidate: 0.6.5.2-2
  Version table:
     0.6.5.2-2 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
roeme commented 8 years ago

@mailinglists35 Your problem is not relevant to this issue, but to #181. If you want to install 0.6.5.3 on debian, you'll simply have to wait.

mailinglists35 commented 8 years ago

@Roeme I thought this issue was about mismatching spl/zfs versions on zol repo (see the apt-cache policy output) on both debian jessie and debian wheezy! Did I understand wrong?

mailinglists35 commented 8 years ago

@roeme and what I was pointing out is that zol repo for ubuntu has spl and zfs versions in sync (which happens to be 0.6.5.3)

roeme commented 8 years ago

@mailinglists35 Yes, you did understand wrong, or rather, didn't read thoroughly. As outlined in the initial report, Debian stable (a.k.a jessie) is not broken, as zfs 0.6.5.2 is compatible with spl 0.6.5.

Baughn commented 8 years ago

I just spent an hour trying to fix a {headless,remote} server that fell victim to this, however.

In light of the fact that spl+zfs installs are not atomic, but single package installs are... perhaps it would make sense to package both the modules in a single package?

mailinglists35 commented 8 years ago

@Baughn I don't know if there is a rule/practice but from what I've observed so far in the linux ecosystem, kernel modules are distributed in their own packages. I don't know the reason/logic for this, but as an end-user I feel the same that there should be a single package to install.

FransUrbo commented 8 years ago

This is one of those won't fix, because it will be absolutely impossible to keep track on what version of zfs is compatible with which version of spl (and possibly vise-versa). They usually go hand-in-hand, and I always upload both of them simultaneous.

However, this is getting even more complicated now that Debian GNU/Linux provides spl, but not zfs. There is very little I can do to rectify this. And I've been saying all along that one needs to pay close attention on what's happening with spl/zfs.

aktau commented 8 years ago

Yea, it's annoying. But:

This is one of those won't fix, because it will be absolutely impossible to keep track on what version of zfs is compatible with which version of spl (and possibly vise-versa). They usually go hand-in-hand, and I always upload both of them simultaneous.

If I understood the current build script correctly, it chooses either an exact match, as you said, or it cuts of the last version number. So 0.6.5.3 will compile with 0.6.5 if present.

Why is why I thought my hack would be OK. It would allow 0.6.5.3 to compile with 0.6.5.2, for example.

Perhaps I'm misunderstanding the versioning semantics of ZFS.

FransUrbo commented 8 years ago

We can't guarantee that that will always work. It might now, but in a future release, it might not. Hence the problem...

FransUrbo commented 8 years ago

@behlendorf I haven't been keeping up-to-date with the releases the last couple of months, but CAN we say that major.minor.revision.hotfix will always compile with major.minor.revision?

roeme commented 8 years ago

However, this is getting even more complicated now that Debian GNU/Linux provides spl, but not zfs. There is very little I can do to rectify this.

@FransUrbo Dunno if it's already done this way, IMO we might then apt-pin both spl and zfs to ZoL's repository, instead of upstream...?

Since the system where I encountered this is running a mix between testing and stable, and I'm away from it currently, I'm not too sure wether the breakage described in OP is and will be just my own (in which case I'm totally fine with WONTFIX), or if users of the next stable, stretch, could/will be affected as well. In the latter case, a more stable (or more gracefully failing) approach might be needed.

FransUrbo commented 8 years ago

@FransUrbo Dunno if it's already done this way, IMO we might then apt-pin both spl and zfs to ZoL's repository, instead of upstream...?

We used to do that, but I removed it long ago. Can't remember the exact details, but I remember that it proved to be more trouble than it's worth... Since the system where I encountered this is running a mix between testing and stable

And that's kinda another issue - I'm not really interested in supporting this! Anyone running 'testing' really, really, REALLY (!!) need to know what they're doing and if anything breaks, it is THEIR fault!

It's kinda the definition (or used to be) of "Debian GNU/Linux Testing"!

Also, to few people is doing this, so it doesn't make sense to provide workarounds for a few people, when most everyone else will suffer from it.

A third thought regarding this is that we're very likely to see a upstream package "shortly"! And this will be moot.

roeme commented 8 years ago

And that's kinda another issue - I'm not really interested in supporting this! Anyone running 'testing' really, really, REALLY (!!) need to know what they're doing and if anything breaks, it is THEIR fault!

It's kinda the definition (or used to be) of "Debian GNU/Linux Testing"!

Also, to few people is doing this, so it doesn't make sense to provide workarounds for a few people, when most everyone else will suffer from it.

As I wrote; and tried to tell people multiple times in this issue, I'm not advocating to support this, at all, nor am I seeking support for it (the next guy saying anything in this direction will be hit by an electronic trout¹).

I opened this issue solely because I suspected that the underlying fault might surface at some point in stable, (see also the not-so-nice degradation modes found by @aktau ).

But yeah, it's all moot if upstream provides both packages. I had (still have) major doubts about the licensing issues; but I'm sure you have more insight into this, so I'm in full agreement to close this issue.

Thanks for the all the hard work so far!

¹) If you get this reference, you're old.

FransUrbo commented 8 years ago

As I wrote; and tried to tell people multiple times in this issue, I'm not advocating to support this, at all, nor am I seeking support for it (the next guy saying anything in this direction will be hit by an electronic trout¹).

I got that (I wasn't suggesting you did), although I didn't get the reference, so maybe there's hope yet :D. "I'm just saying"... But yeah, it's all moot if upstream provides both packages. I had (still have) major doubts about the licensing issues; but I'm sure you have more insight into this, so I'm in full agreement to close this issue.

We're "pretty sure" it's ok, as long as we don't provide binary modules. That might be ok to, we'll just see what happens with the Ubuntu issue eventually.

Baughn commented 8 years ago

Wouldn't it be easier to provide both the spl and zfs modules as a single package?

I get that it's been normal to split them up, but what does anyone really gain by doing so?

FransUrbo commented 8 years ago

That is (or was) on the TODO for the long run. It's just not something anyone is really interested in doing, the current system "works".

mailinglists35 commented 8 years ago

Anyone running 'testing' really, really, REALLY (!!) need to know what they're doing and if anything breaks, it is THEIR fault!

It's kinda the definition (or used to be) of "Debian GNU/Linux Testing"!

sorry for offtopic, but I can't find any reference on debian website about testing being as dangerous as you say; that description seems to me to better describe the unstable, not testing. from https://www.debian.org/releases/ :

"testing The testing distribution contains packages that haven't been accepted into a stable release yet, but they are in the queue for that. The main advantage of using this distribution is that it has more recent versions of software."

"Packages are installed into the `testing' directory after they have undergone some degree of testing in unstable.

They must be in sync on all architectures where they have been built and mustn't have dependencies that make them uninstallable; they also have to have fewer release-critical bugs than the versions currently in testing. This way, we hope that `testing' is always close to being a release candidate."

anyway thanks for the long time providing packages and let's hope zfs will finally be accepted by ftp masters.