johnramsden / zedenv-grub

zedenv plugin for GRUB
https://zedenv.readthedocs.io/en/latest/plugins.html#grub
BSD 3-Clause "New" or "Revised" License
11 stars 4 forks source link

05_zfs_linux.py fails with TypeError #26

Closed tigoesnumb3rs closed 5 years ago

tigoesnumb3rs commented 5 years ago

Hey,

when running update-grub, I get this error:

Sourcing file `/etc/default/grub'
Generating grub configuration file ...
Traceback (most recent call last):
  File "/etc/grub.d/05_zfs_linux.py", line 849, in <module>
    for en in Generator().generate_grub_entries():
  File "/etc/grub.d/05_zfs_linux.py", line 690, in generate_grub_entries
    self.grub_boot_on_zfs, self.grub_boot_device)
  File "/etc/grub.d/05_zfs_linux.py", line 112, in __init__
    self.be_root, self.boot_environment)
  File "/usr/lib/python3.7/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.7/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'NoneType'

I'm using zedenv version 0.3.7, zedenv-grub version 0.1.7 and pyzfscmds version 0.1.5 on a Ubuntu 18.04.

mkessler001 commented 5 years ago

Hey,

can you please provide more information about your setup.

Best regards Markus

tigoesnumb3rs commented 5 years ago

Hey,

Yes, I'm using two pools. I'm actually using a slightly modified version of your zedenv-setup. I did use zedenv and zedenv-grub from johnramsden, not from your repo, other than that it should be pretty similiar to your orginal script.

~$ zfs list
NAME                                   USED  AVAIL  REFER  MOUNTPOINT
bpool                                  212M  3.64G    96K  /
bpool/BOOT                             211M  3.64G    96K  none
bpool/BOOT/env                         203M  3.64G    96K  none
bpool/BOOT/env/zedenv-001-2019-10-28    56K  3.64G   141M  legacy
bpool/BOOT/env/zedenv-002-2019-10-29     8K  3.64G   141M  legacy
bpool/BOOT/env/zedenv-default          203M  3.64G   141M  legacy
bpool/BOOT/grub                       7.22M  3.64G  7.22M  legacy
rpool                                  119G   273G    96K  /
rpool/ROOT                            3.58G   273G    96K  none
rpool/ROOT/001-2019-10-28                8K   273G  3.11G  legacy
rpool/ROOT/002-2019-10-29                8K   273G  3.50G  legacy
rpool/ROOT/default                    3.58G   273G  3.51G  legacy
rpool/home                             115G   273G    96K  /home
rpool/home/root                        328K   273G   328K  /root
rpool/home/user                        115G   273G    96K  none
rpool/home/user/.cache                22.7M   273G  22.7M  /home/user/.cache
rpool/home/user/downloads               96K   273G    96K  /home/user/downloads
rpool/home/user/opt                     96K   273G    96K  /home/user/opt
rpool/home/user/src                     96K   273G    96K  /home/user/src
rpool/home/user/tmp                    100K   273G   100K  /home/user/tmp
rpool/home/user/uni                   1.25M   273G  1.25M  /home/user/uni
rpool/home/user/usr                    115G   273G   115G  /home/user/usr
rpool/opt                             2.57M   273G  2.57M  /opt
rpool/var                              201M   273G    96K  none
rpool/var/cache                        188M   273G   188M  /var/cache
rpool/var/lib                          576K   273G    96K  none
rpool/var/lib/docker                    96K   273G    96K  /var/lib/docker
rpool/var/lib/flatpak                   96K   273G    96K  none
rpool/var/lib/lxc                       96K   273G    96K  none
rpool/var/lib/lxcfs                     96K   273G    96K  none
rpool/var/lib/lxd                       96K   273G    96K  none
rpool/var/log                         12.2M   273G  12.2M  legacy
rpool/var/snap                          96K   273G    96K  none
rpool/var/spool                        128K   273G   128K  legacy
rpool/var/tmp                          168K   273G   168K  legacy

The output of zedenv get is:

~$ zedenv get
PROPERTY               VALUE  
org.zedenv:bootloader  grub   
org.zedenv.grub:boot   /boot
johnramsden commented 5 years ago

Could you be able to try adding the following before line 690:

                print(f"be_root=${self.be_root}\nboot_environment=${grub_entry.boot_environment}")

https://github.com/johnramsden/zedenv-grub/blob/d4bf9cecdf83c7c345d57d6b9b8c44edcd789a40/grub.d/05_zfs_linux.py#L688-L690

Then run grub-mkconfig and post the output.

Has this worked before for you when using zedenv activate?

tigoesnumb3rs commented 5 years ago

zedenv activate runs through without any errors, but I think it didn't activate the boot environment.

The output of grub-mkconfig is:

~$ sudo grub-mkconfig
Sourcing file `/etc/default/grub'
Generating grub configuration file ...
#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

### BEGIN /etc/grub.d/00_header ###
if [ -s $prefix/grubenv ]; then
  set have_grubenv=true
  load_env
fi
if [ "${next_entry}" ] ; then
   set default="${next_entry}"
   set next_entry=
   save_env next_entry
   set boot_once=true
else
   set default="0"
fi

if [ x"${feature_menuentry_id}" = xy ]; then
  menuentry_id_option="--id"
else
  menuentry_id_option=""
fi

export menuentry_id_option

if [ "${prev_saved_entry}" ]; then
  set saved_entry="${prev_saved_entry}"
  save_env saved_entry
  set prev_saved_entry=
  save_env prev_saved_entry
  set boot_once=true
fi

function savedefault {
  if [ -z "${boot_once}" ]; then
    saved_entry="${chosen}"
    save_env saved_entry
  fi
}
function recordfail {
  set recordfail=1
  # GRUB lacks write support for zfs, so recordfail support is disabled.
}
function load_video {
  if [ x$feature_all_video_module = xy ]; then
    insmod all_video
  else
    insmod efi_gop
    insmod efi_uga
    insmod ieee1275_fb
    insmod vbe
    insmod vga
    insmod video_bochs
    insmod video_cirrus
  fi
}

terminal_input console
terminal_output console
if [ "${recordfail}" = 1 ] ; then
  set timeout=30
else
  if [ x$feature_timeout_style = xy ] ; then
    set timeout_style=menu
    set timeout=2
  # Fallback normal timeout code in case the timeout_style feature is
  # unavailable.
  else
    set timeout=2
  fi
fi
if [ $grub_platform = efi ]; then
  set timeout=30
  if [ x$feature_timeout_style = xy ] ; then
    set timeout_style=menu
  fi
fi
### END /etc/grub.d/00_header ###

### BEGIN /etc/grub.d/05_debian_theme ###
set menu_color_normal=white/black
set menu_color_highlight=black/light-gray
#set_background_image "images/tile.png";

set menu_color_normal=white/black
set menu_color_highlight=black/light-gray
if background_color 0,0,0; then
  clear
fi
### END /etc/grub.d/05_debian_theme ###

### BEGIN /etc/grub.d/05_zfs_linux.py ###
/boot/zfsenv/zedenv-default
/boot/zfsenv/zedenv-001-2019-10-28
/boot/zfsenv/zedenv-002-2019-10-29
Traceback (most recent call last):
  File "/etc/grub.d/05_zfs_linux.py", line 852, in <module>
    for en in Generator().generate_grub_entries():
  File "/etc/grub.d/05_zfs_linux.py", line 688, in generate_grub_entries
    self.grub_boot_on_zfs, self.grub_boot_device)
  File "/etc/grub.d/05_zfs_linux.py", line 110, in __init__
    self.be_root, self.boot_environment)
  File "/usr/lib/python3.6/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.6/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'NoneType'
johnramsden commented 5 years ago

Is that with the print statement?

tigoesnumb3rs commented 5 years ago

Yes. It does however not seem to make a difference, if the print is there or not.

When I run grub-mkconfig for the first time after booting, I get:

### BEGIN /etc/grub.d/05_zfs_linux.py ###
/boot/zfsenv/zedenv-default
/boot/zfsenv/zedenv-001-2019-10-28
/boot/zfsenv/zedenv-002-2019-10-29
Traceback (most recent call last):
  File "/etc/grub.d/05_zfs_linux.py", line 852, in <module>
    for en in Generator().generate_grub_entries():
  File "/etc/grub.d/05_zfs_linux.py", line 688, in generate_grub_entries
    self.grub_boot_on_zfs, self.grub_boot_device)
  File "/etc/grub.d/05_zfs_linux.py", line 110, in __init__
    self.be_root, self.boot_environment)
  File "/usr/lib/python3.6/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.6/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'NoneType'

After that I only get:

### BEGIN /etc/grub.d/05_zfs_linux.py ###
Traceback (most recent call last):
  File "/etc/grub.d/05_zfs_linux.py", line 852, in <module>
    for en in Generator().generate_grub_entries():
  File "/etc/grub.d/05_zfs_linux.py", line 688, in generate_grub_entries
    self.grub_boot_on_zfs, self.grub_boot_device)
  File "/etc/grub.d/05_zfs_linux.py", line 110, in __init__
    self.be_root, self.boot_environment)
  File "/usr/lib/python3.6/posixpath.py", line 94, in join
    genericpath._check_arg_types('join', a, *p)
  File "/usr/lib/python3.6/genericpath.py", line 149, in _check_arg_types
    (funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'NoneType'

regardless of whether or not the print is there or not.

johnramsden commented 5 years ago

Sorry I think I got the wrong location, can you try this at L684?

                print(f"i=${i}\nj=${j}")

diff:

diff --git a/grub.d/05_zfs_linux.py b/grub.d/05_zfs_linux.py
index ac4816d..95c685d 100755
--- a/grub.d/05_zfs_linux.py
+++ b/grub.d/05_zfs_linux.py
@@ -681,6 +681,8 @@ class Generator:
                                     key=functools.cmp_to_key(Generator.kernel_comparator))

             for j in kernels_sorted:
+                print(f"i=${i}\nj=${j}")
+
                 grub_entry = GrubLinuxEntry(
                     os.path.join(i['directory'], j), self.grub_os, self.be_root, self.rpool,
                     self.genkernel_arch, i, self.grub_cmdline_linux,
tigoesnumb3rs commented 5 years ago

Sure, the output of the print is this (added some \ns for readability):

i=${'directory': '/boot/zfsenv/zedenv-002-2019-10-29', 
      'files': ['initrd.img-5.0.0-1015-oem-osp1', 
                'memtest86+.elf', 
                'System.map-5.0.0-32-generic', 
                'config-5.0.0-32-generic', 
                'memtest86+_multiboot.bin', 
                'vmlinuz-5.0.0-32-generic', 
                'grub',
                'config-5.0.0-1015-oem-osp1',
                'efi',
                'initrd.img-5.0.0-32-generic', 
                'System.map-5.0.0-1015-oem-osp1', 
                'zfsenv',
                'memtest86+.bin', 
                'vmlinuz-5.0.0-1015-oem-osp1'], 
        'kernels': ['vmlinuz-5.0.0-32-generic', 
                    'vmlinuz-5.0.0-1015-oem-osp1']}
j=$vmlinuz-5.0.0-32-generic
Traceback (most recent call last):
...

I'm not sure if the following information helps, but zedenv list looks like this for me:

Name            Active   Mountpoint   Creation              
default         NR       /            Mon-Oct-28-22:17-2019 
001-2019-10-28           -            Mon-Oct-28-22:56-2019 
002-2019-10-29           -            Tue-Oct-29-0:34-2019  

002-2019-10-29 is the most recently created Boot Environment.

When I create a new Boot Environment (zedenv create 003-2019-10-31) and then run the grub-mkconfig command again, it seems to crash at the same point, still in 002-2019-10-29 and not in 003-2019-10-31 even though the newly created Boot Environment seems to be found:

/boot/zfsenv/zedenv-default
/boot/zfsenv/zedenv-001-2019-10-28
/boot/zfsenv/zedenv-002-2019-10-29
/boot/zfsenv/zedenv-003-2019-10-31
i=${'directory': '/boot/zfsenv/zedenv-002-2019-10-29', 
      'files': ['initrd.img-5.0.0-1015-oem-osp1', 
                 'memtest86+.elf',  
                 'System.map-5.0.0-32-generic', 
                 'config-5.0.0-32-generic', 
                 'memtest86+_multiboot.bin', 
                 'vmlinuz-5.0.0-32-generic', 
                 'grub', 
                 'config-5.0.0-1015-oem-osp1',
                  'efi',
                  'initrd.img-5.0.0-32-generic', 
                 'System.map-5.0.0-1015-oem-osp1', 
                 'zfsenv', 
                 'memtest86+.bin', 
                 'vmlinuz-5.0.0-1015-oem-osp1'], 
       'kernels': ['vmlinuz-5.0.0-32-generic', 
                  'vmlinuz-5.0.0-1015-oem-osp1']}
j=$vmlinuz-5.0.0-32-generic
Traceback (most recent call last):
...
johnramsden commented 5 years ago

Strange, the trace makes it seems like None is being passed to join, but everything looks OK so far.

Can you try this?

diff --git a/grub.d/05_zfs_linux.py b/grub.d/05_zfs_linux.py
index ac4816d..666a184 100755
--- a/grub.d/05_zfs_linux.py
+++ b/grub.d/05_zfs_linux.py
@@ -681,6 +681,19 @@ class Generator:
                                     key=functools.cmp_to_key(Generator.kernel_comparator))

             for j in kernels_sorted:
+                print(f"i={i}\n")
+                print(f"j={j}\n")
+                print(f"self.grub_os={self.grub_os}\n")
+                print(f"self.be_root={self.be_root}\n")
+                print(f"self.rpool={self.rpool}\n")
+                print(f"self.genkernel_arch={self.genkernel_arch}\n")
+                print(f"self.grub_cmdline_linux={self.grub_cmdline_linux}\n")
+                print(f"self.grub_cmdline_linux_default={self.grub_cmdline_linux_default}\n")
+                print(f"self.grub_devices={self.grub_devices}\n")
+                print(f"self.default={self.default}\n")
+                print(f"self.grub_boot_on_zfs={self.grub_boot_on_zfs}\n")
+                print(f"self.grub_boot_device={self.grub_boot_device}\n")
+
                 grub_entry = GrubLinuxEntry(
                     os.path.join(i['directory'], j), self.grub_os, self.be_root, self.rpool,
                     self.genkernel_arch, i, self.grub_cmdline_linux,
tigoesnumb3rs commented 5 years ago
/boot/zfsenv/zedenv-default
/boot/zfsenv/zedenv-001-2019-10-28
/boot/zfsenv/zedenv-002-2019-10-29
/boot/zfsenv/zedenv-003-2019-10-31
i={'directory': '/boot/zfsenv/zedenv-002-2019-10-29', 'files': ['initrd.img-5.0.0-1015-oem-osp1', 'memtest86+.elf', 'System.map-5.0.0-32-generic', 'config-5.0.0-32-generic', 'memtest86+_multiboot.bin', 'vmlinuz-5.0.0-32-generic', 'grub', 'config-5.0.0-1015-oem-osp1', 'efi', 'initrd.img-5.0.0-32-generic', 'System.map-5.0.0-1015-oem-osp1', 'zfsenv', 'memtest86+.bin', 'vmlinuz-5.0.0-1015-oem-osp1'], 'kernels': ['vmlinuz-5.0.0-32-generic', 'vmlinuz-5.0.0-1015-oem-osp1']}

j=vmlinuz-5.0.0-32-generic

self.grub_os=Ubuntu GNU/Linux

self.be_root=rpool/ROOT

self.rpool=rpool

self.genkernel_arch=x86_64

self.grub_cmdline_linux=

self.grub_cmdline_linux_default=

self.grub_devices=['/dev/mapper/root']

self.default=

self.grub_boot_on_zfs=True

self.grub_boot_device=['/dev/nvme0n1p2']

Traceback (most recent call last):
...
johnramsden commented 5 years ago

Very odd, looks OK. I'll do some more investigation and get back to you. Thanks for your patience.

mkessler001 commented 5 years ago

I can confirm the bug on my system. It seems that the name of a boot environment should not contain a dash.

@tigoesnumb3rs: Can you please test creating a new boot environment without a dash in the name?

johnramsden commented 5 years ago

@mkessler001 With a dash do you get this error?

mkessler001 commented 5 years ago

Yes, the error occurs when I a new boot environment contains a dash in the name (e.g. 003-2019-10-31 or test-test). Is this reproducible on your system?

If this error does not appear on your system, I would assume that it has something to do with parsing the name for the separate boot pool (bpool/BOOT/env/zedenv-001-2019-10-28).

johnramsden commented 5 years ago

It doesn't occur on my system, no. I have dashes in most of my envs.

johnramsden commented 5 years ago

@mkessler001 Where in the sequence does it crash for you? If it's in a different location, could you open a new issue either in zedenv or zedenv-grub depending on where it occurs. Running with --verbose may give you an idea.

mkessler001 commented 5 years ago

The pull request should fix this. Please report back if this solved your issue.

Best regards Markus

johnramsden commented 5 years ago

@tigoesnumb3rs could you confirm if this fixes the issue?

If you don't want to pull the new code, you could just change L440 to the following:

https://github.com/johnramsden/zedenv-grub/blob/03b1b8c43a6b767c2c400d43977d4ff7625e230b/grub.d/05_zfs_linux.py#L440

tigoesnumb3rs commented 5 years ago

Yes, I can confirm, the error is gone.

Thanks a lot for your help!

tigoesnumb3rs commented 5 years ago

Hey,

I'm afraid I might have been a bit premature. A few hours ago I pulled the latest version and ran grub-mkconfig and then grub-update.

The error was indeed gone, but I then I wasn't able to reboot successfully. Since I'm currently not at home, I only have some mobile networking and a live stick, the following is roughly what happened (as good as I can remember):

Before installing I had 4 Boot Environments, the one I actively use (default) as well as others, which I used as snapshots and which were named by the date I created them. When booting, grub first presented me with a entry that contained boot environments, then with the snapshot boot environments. There I could select which environment to boot from. Then after entering the LUKS password, I was asked to choose from a list of boot environments (here in one case the default boot environment was gone in all other cases (meaning in different Boot Environments previously selected in grub) this list was empty.

I don't recall specifically which one worked, but with some compination of the above I got into a systemd rescue shell, something went wrong with the zfs systemd services.

My ZFS layout currently looks weird as well:

NAME                                      USED  AVAIL  REFER  MOUNTPOINT
bpool                                     212M  3.64G    96K  /mnt
bpool/BOOT                                211M  3.64G    96K  none
bpool/BOOT/env                            204M  3.64G    96K  none
bpool/BOOT/env/zedenv-001-2019-10-28       72K  3.64G   141M  legacy
bpool/BOOT/env/zedenv-002-2019-10-29       72K  3.64G   141M  legacy
bpool/BOOT/env/zedenv-003-2019-10-31       72K  3.64G   141M  legacy
bpool/BOOT/env/zedenv-default             203M  3.64G   141M  legacy
bpool/BOOT/grub                          7.21M  3.64G  7.21M  legacy
rpool                                     117G   274G    96K  /mnt
rpool/ROOT                               3.71G   274G    96K  none
rpool/ROOT/001-2019-10-28                   8K   274G  3.11G  /mnt
rpool/ROOT/002-2019-10-29                   8K   274G  3.50G  legacy
rpool/ROOT/003-2019-10-31                   8K   274G  3.54G  legacy
rpool/ROOT/default                       3.70G   274G  3.54G  /mnt/mnt
rpool/ROOT/default_2019-10-28-22-241202  4.99M   274G  3.11G  none
rpool/ROOT/default_2019-10-31-21-679644  5.28M   274G  3.53G  none

I suddenly have rpool/ROOT/default_2019-10-28-22-241202 and rpool/ROOT/default_2019-10-31-21-679644, which I never created. I think I was able to boot into one of those and somehow the corresponding files in bpool were missing.

I'll update this and add more information, when I'm home in a few days. My setup is pretty much taken from here, maybe it is reproducible in a virtual machine.

mkessler001 commented 5 years ago

Please give me a moment. I've found the bug and I'm going to commit a fix soon.

mkessler001 commented 5 years ago

Please integrate the fix comited in 0a4f806 and remove your ZFS datasets, which are accidentally created by booting a snapshot.

❯ zfs destroy rpool/ROOT/default_2019-10-28-22-241202
❯ zfs destroy rpool/ROOT/default_2019-10-31-21-679644

This happened because the python regex command included the trailing "@" for the command:

❯ grub-mkrelpath /boot
/boot/env/zedenv-test-BE@

The output of the regex parsing is used to specify the root mount in /etc/grub/grub.cfg:

WRONG: linux /boot/env/zedenv-test-BE@/vmlinuz-4.15.0-66-generic root=ZFS=rpool/root/test-BE@ rw
OK:    linux /boot/env/zedenv-test-BE@/vmlinuz-4.15.0-66-generic root=ZFS=rpool/root/test-BE rw

Setting root=ZFS=...@ leads to the behavior that you are asked to select a snapshot to boot.

tigoesnumb3rs commented 5 years ago

I've just updated zedenv-grub and removed the zfs datasets. It seems to work fine again! Thank you so much for your help!

mkessler001 commented 5 years ago

No problem, I'm glad to help. Enjoy your ZFS-based system.