openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.55k stars 1.74k forks source link

zfs vg and lv info inactive after node reboot #11968

Closed httaotao closed 3 years ago

httaotao commented 3 years ago

System information

Type Version/Name
Distribution Name Ubuntu 18.04.5 LTS \n \l
Linux Kernel 4.15.0-136-generic

Describe the problem you're observing

when I use comond to format zfs device into lvm2 , then after node reboot , the pv, lv info display, we should use commond to deal with.

Describe how to reproduce the problem

this is our shell script to create lvm2.

sudo zfs create -V $size -o compression=on gpool/DATA/glusterfs/$vg_name

sudo pvcreate -qq --metadatasize=512M --dataalignment=256K  $absPath/$vg_name

sudo vgcreate -qq --physicalextentsize=4M --autobackup=y $vg_name $absPath/$vg_name

sudo pvs -o pv_name,pv_uuid,vg_name --reportformat=json $absPath/$vg_name

sudo udevadm info --query=symlink --name=$absPath/$vg_name

sudo vgdisplay -c $vg_name

lvcreate -L $vsize --thinpool thinpool_$vg_name  $vg_name -Zn

lvcreate -V $vsize --thin -n lvuser_$vg_name $vg_name/thinpool_$vg_name 

sudo mkfs.xfs -i size=512 -n size=4096 -s size=4096 -K  /dev/mapper/$vg_name-lvuser_$vg_name

this our shell script to active zfs vg and lv info.

sudo pvs
sudo vgscan --cache
sudo lvscan

sudo lvchange -ay ...
rincebrain commented 3 years ago

Hi!

I've reread this twice, and it's not clear to me what you're trying to report here.

Could you please explain what the behavior you're seeing is, and what the behavior would ideally be instead?

Thanks!

httaotao commented 3 years ago

Hi!

I've reread this twice, and it's not clear to me what you're trying to report here.

Could you please explain what the behavior you're seeing is, and what the behavior would ideally be instead?

Thanks!

I am sorry that I don't describe clearly. Firstly, In each node ,we have 12*12T disk and use zfs radiz2 , the zfs info like this

$ sudo zpool  status 
  pool: bpool
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 0h0m with 0 errors on Sun Apr 11 00:24:05 2021
config:

    NAME        STATE     READ WRITE CKSUM
    bpool       ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        sda2    ONLINE       0     0     0
        sdb2    ONLINE       0     0     0
        sdc2    ONLINE       0     0     0
        sdd2    ONLINE       0     0     0
        sde2    ONLINE       0     0     0
        sdf2    ONLINE       0     0     0
        sdg2    ONLINE       0     0     0
        sdh2    ONLINE       0     0     0
        sdi2    ONLINE       0     0     0
        sdj2    ONLINE       0     0     0
        sdk2    ONLINE       0     0     0
    spares
      sdl2      AVAIL   

errors: No known data errors

  pool: gpool
 state: ONLINE
  scan: scrub repaired 0B in 5h12m with 0 errors on Sun Apr 11 05:36:49 2021
config:

    NAME        STATE     READ WRITE CKSUM
    gpool       ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        sda4    ONLINE       0     0     0
        sdb4    ONLINE       0     0     0
        sdc4    ONLINE       0     0     0
        sdd4    ONLINE       0     0     0
        sde4    ONLINE       0     0     0
        sdf4    ONLINE       0     0     0
        sdg4    ONLINE       0     0     0
        sdh4    ONLINE       0     0     0
        sdi4    ONLINE       0     0     0
        sdj4    ONLINE       0     0     0
        sdk4    ONLINE       0     0     0
    spares
      sdl4      AVAIL   

errors: No known data errors

  pool: hpool
 state: ONLINE
  scan: scrub repaired 0B in 4h15m with 0 errors on Sun Apr 11 04:39:14 2021
config:

    NAME        STATE     READ WRITE CKSUM
    hpool       ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        sda5    ONLINE       0     0     0
        sdb5    ONLINE       0     0     0
        sdc5    ONLINE       0     0     0
        sdd5    ONLINE       0     0     0
        sde5    ONLINE       0     0     0
        sdf5    ONLINE       0     0     0
        sdg5    ONLINE       0     0     0
        sdh5    ONLINE       0     0     0
        sdi5    ONLINE       0     0     0
        sdj5    ONLINE       0     0     0
        sdk5    ONLINE       0     0     0
    spares
      sdl5      AVAIL   

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0h19m with 0 errors on Sun Apr 11 00:43:17 2021
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      raidz2-0  ONLINE       0     0     0
        sda3    ONLINE       0     0     0
        sdb3    ONLINE       0     0     0
        sdc3    ONLINE       0     0     0
        sdd3    ONLINE       0     0     0
        sde3    ONLINE       0     0     0
        sdf3    ONLINE       0     0     0
        sdg3    ONLINE       0     0     0
        sdh3    ONLINE       0     0     0
        sdi3    ONLINE       0     0     0
        sdj3    ONLINE       0     0     0
        sdk3    ONLINE       0     0     0
    spares
      sdl3      AVAIL   

errors: No known data errors

Secondly, we will use zfs to create a device , use this commond

$sudo zfs create -V 100G gpool/DATA/glusterfs/heketi-xxx

and then the device will be used by heketi and it will format the device into lvm2. By the way, heketi will use pvcreate ,vgcreate lvcreate commond to create some lvm info,like this.

$ sudo vgs 
  VG                                  #PV #LV #SN Attr   VSize     VFree   
  vg_0e9cbaaa7a60309535f5d47222f779c2   1   4   0 wz--n-    99.87g   <1.48g
  vg_22c9ded77e19f96688ae123403ac3693   1   2   0 wz--n-    <5.00t    4.99t
  vg_30bbbdc69dcc4bcca646121e14990fb9   1   0   0 wz--n-     1.17t    1.17t

$ sudo pvs 
  PV         VG                                  Fmt  Attr PSize     PFree   
  /dev/zd128 brick_1231_02                       lvm2 a--    <17.50g    1.25g
  /dev/zd144 vg_b9af7f120ce2de3d997663cc283ff087 lvm2 a--   1023.87g  505.80g
  /dev/zd160 brick_1231_01                       lvm2 a--    <17.50g    1.25g
  /dev/zd192 vg_0e9cbaaa7a60309535f5d47222f779c2 lvm2 a--     99.87g   <1.48g
  /dev/zd208 vg_7954a7ee1b02c54bd88abf2acf650074 lvm2 a--     69.87g   34.60g

$ sudo lvs
  LV                                     VG                                  Attr       LSize    Pool                                Origin Data%  Meta%  Move Log Cpy%Sync Convert
  lvuser_arbiter_test_8g                 arbiter_test_8g                     Vwi-aot---    7.20g thinpool_arbiter_test_8g                   0.15                                   
  thinpool_arbiter_test_8g               arbiter_test_8g                     twi-aot---    7.20g                                            0.15   10.94                           
  lvuser_brick_1231_01                   brick_1231_01                       Vwi---t---   16.20g thinpool_brick_1231_01                                                            
  thinpool_brick_1231_01                 brick_1231_01                       twi---t---   16.20g                                                                                   
  lvuser_brick_1231_02                   brick_1231_02                       Vwi---t---   16.20g thinpool_brick_1231_02                                                            

...

Unfortunately, when the node reboot ,it can't show these info. we must to use the second script to active these info. we don't know why it is. Of course, we use local device to test, it is ok.

We suspect that this is because after the block device created by ZFS is formatted as lvm2, the metadata information is not recorded in the operating system sector. However, it is possible to use the local hard disk.

thanks.

rincebrain commented 3 years ago

We suspect that this is because after the block device created by ZFS is formatted as lvm2, the metadata information is not recorded in the operating system sector. However, it is possible to use the local hard disk.

That's not how things work.

If the VG/LV you created aren't automatically activated on reboot but activate fine if you manually run the commands once the system is booted, then it's probably the case that the service for setting up LVM devices on boot is running and finishing before the ZFS pools are imported.

You could likely work around this by having LVM run the relevant scans after the service that imports your ZFS pools. For me, the latter is handled by zfs-import-cache.service, and I don't know how the former is handled in the systemd world.

httaotao commented 3 years ago

We suspect that this is because after the block device created by ZFS is formatted as lvm2, the metadata information is not recorded in the operating system sector. However, it is possible to use the local hard disk.

That's not how things work.

If the VG/LV you created aren't automatically activated on reboot but activate fine if you manually run the commands once the system is booted, then it's probably the case that the service for setting up LVM devices on boot is running and finishing before the ZFS pools are imported.

You could likely work around this by having LVM run the relevant scans after the service that imports your ZFS pools. For me, the latter is handled by zfs-import-cache.service, and I don't know how the former is handled in the systemd world.

I am sorry that I don't know how to do, I tried to change the zfs-import-cache.service like that

# cat /lib/systemd/system/zfs-import-cache.service
[Unit]
Description=Import ZFS pools by cache file
Documentation=man:zpool(8)
DefaultDependencies=no
Requires=systemd-udev-settle.service
Requires=zfs-load-module.service
After=systemd-udev-settle.service
After=zfs-load-module.service
After=cryptsetup.target
After=systemd-remount-fs.service
Before=dracut-mount.service
Before=zfs-import.target
#lvm2
Before=lvm2-lvmpolld.service
Before=lvm2-lvmetad.service
ConditionPathExists=/etc/zfs/zpool.cache

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN
ExecStart=/bin/bash /root/lvscanRepairOnly.sh > /root/lvscanRepairOnly.txt

[Install]
WantedBy=zfs-import.target

But it doesn't matter.

rincebrain commented 3 years ago

This is not an LVM debugging forum, but from my understanding, neither lvmpolld nor lvmetad are what's responsible for enumerating your devices.

This suggests several services you could attempt Before= with.

httaotao commented 3 years ago

This is not an LVM debugging forum, but from my understanding, neither lvmpolld nor lvmetad are what's responsible for enumerating your devices.

This suggests several services you could attempt Before= with.

thanks for help, we found that change /etc/lvm/lvm.conf use_lvmetad into 0, it works. but we have to check this parameter function. thanks