Closed geaaru closed 6 years ago
@geaaru could you try rolling back the kernel version to see if the problem is specific to 0.7.6?
@rincebrain Hi, I confirm that on downgrade to 4.9.22 all works fine.
I try with last release of 4.9 tree (4.9.81).
@geaaru so it works fine with 4.9.22 and 0.7.6?
@rincebrain Yes, it seems so. 4.9.22 + 0.7.6 works fine. Let me try in these days if I receive some errors. But for now with a fast test it seems that is all ok. I can also mount raidz pool (describe on #6981).
Some issue with 4.9.81.
[ 202.566573] CPU: 0 PID: 887 Comm: vdev_open Tainted: P O 4.9.81-geaaru #1
[ 202.566576] Hardware name: Allwinner sun7i (A20) Family
[ 202.566610] [<c010d410>] (unwind_backtrace) from [<c010a598>] (show_stack+0x10/0x14)
[ 202.566627] [<c010a598>] (show_stack) from [<c0538208>] (dump_stack+0x7c/0x9c)
[ 202.566688] [<c0538208>] (dump_stack) from [<bf138e88>] (spl_kmem_zalloc+0x8c/0x158 [spl])
[ 202.567279] [<bf138e88>] (spl_kmem_zalloc [spl]) from [<bf27f618>] (vdev_disk_io_start+0x1c8/0x5a0 [zfs])
[ 202.567942] [<bf27f618>] (vdev_disk_io_start [zfs]) from [<bf2c32d0>] (zio_vdev_io_start+0x2f8/0x32c [zfs])
[ 202.568537] [<bf2c32d0>] (zio_vdev_io_start [zfs]) from [<bf2c1cac>] (zio_nowait+0x134/0x158 [zfs])
[ 202.569120] [<bf2c1cac>] (zio_nowait [zfs]) from [<bf2794f8>] (vdev_probe+0x1ec/0x21c [zfs])
[ 202.569701] [<bf2794f8>] (vdev_probe [zfs]) from [<bf27ce28>] (vdev_open+0x4e0/0x5d4 [zfs])
[ 202.570281] [<bf27ce28>] (vdev_open [zfs]) from [<bf27d058>] (vdev_open_child+0x20/0x30 [zfs])
[ 202.570608] [<bf27d058>] (vdev_open_child [zfs]) from [<bf13be5c>] (taskq_thread+0x2a8/0x3f8 [spl])
[ 202.570648] [<bf13be5c>] (taskq_thread [spl]) from [<c0138550>] (kthread+0xf8/0x10c)
[ 202.570662] [<c0138550>] (kthread) from [<c0106d90>] (ret_from_fork+0x14/0x24)
[ 202.959285] CPU: 1 PID: 886 Comm: vdev_open Tainted: P O 4.9.81-geaaru #1
[ 202.967196] Hardware name: Allwinner sun7i (A20) Family
[ 202.972455] [<c010d410>] (unwind_backtrace) from [<c010a598>] (show_stack+0x10/0x14)
[ 202.980205] [<c010a598>] (show_stack) from [<c0538208>] (dump_stack+0x7c/0x9c)
[ 202.987485] [<c0538208>] (dump_stack) from [<bf138e88>] (spl_kmem_zalloc+0x8c/0x158 [spl])
[ 202.996382] [<bf138e88>] (spl_kmem_zalloc [spl]) from [<bf27f618>] (vdev_disk_io_start+0x1c8/0x5a0 [zfs])
[ 203.006873] [<bf27f618>] (vdev_disk_io_start [zfs]) from [<bf2c32d0>] (zio_vdev_io_start+0x2f8/0x32c [zfs])
[ 203.017499] [<bf2c32d0>] (zio_vdev_io_start [zfs]) from [<bf2c1cac>] (zio_nowait+0x134/0x158 [zfs])
[ 203.027434] [<bf2c1cac>] (zio_nowait [zfs]) from [<bf2794f8>] (vdev_probe+0x1ec/0x21c [zfs])
[ 203.036758] [<bf2794f8>] (vdev_probe [zfs]) from [<bf27ce28>] (vdev_open+0x4e0/0x5d4 [zfs])
[ 203.046004] [<bf27ce28>] (vdev_open [zfs]) from [<bf27d058>] (vdev_open_child+0x20/0x30 [zfs])
[ 203.055168] [<bf27d058>] (vdev_open_child [zfs]) from [<bf13be5c>] (taskq_thread+0x2a8/0x3f8 [spl])
[ 203.064254] [<bf13be5c>] (taskq_thread [spl]) from [<c0138550>] (kthread+0xf8/0x10c)
[ 203.072006] [<c0138550>] (kthread) from [<c0106d90>] (ret_from_fork+0x14/0x24)
I try to investigate and found from which version is begin this regression.
@geaaru are you saying that 4.9.80 works with 0.7.6, but 4.9.81 does not?
If so, can you bisect the commits between those two and see where it starts?
@rincebrain Let me try to summarize my tests:
I try to investigate which version from 4.9.23 create this regression. For now it seems that kernel >= 4.9.77 is broken with zfs-0.7.6.
@geaaru i run zfs-0.7.6/linux-4.9.80 on my RaspberryPi 3 without issues; i should be able to get a spare BananaPi soon, would you mind sharing your Kconfig so i can try reproducing this on my hardware? Thanks!
@loli10K hi, do you means "config" used for compile kernel ? config-bpi-4.9.81.gz
@geaaru yes, thank you.
zfs-0.7.6/linux-4.9.81 (with your Kconfig) seem to work fine on my BPi M1 with a newly created pool:
root@bananapi:~# cat /sys/module/zfs/version
0.7.6-1
root@bananapi:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
fish 14.4G 178K 14.4G - 0% 0% 1.00x ONLINE -
root@bananapi:~# uname -r
4.9.81-geaaru
root@bananapi:~# zpool export -a
root@bananapi:~# zpool import -d /dev/ -a
root@bananapi:~# zpool status
pool: fish
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
fish ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sda ONLINE 0 0 0
errors: No known data errors
root@bananapi:~#
@loli10K hi, thank you for your time.
I think that something is present because if I use kernel 4.9.22 all works fine. I will investigate on it.
As visible on issue #6981 use new pools often doesn't present any issues, it is needed use existing pool from previous version to catch errors. Can you try to do these tests:
Could the problem connect to size of pools ?
zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 1.17T 437G 37.3M /zpool
data2 1.54T 380G 525G /data2
I use vdev only inside a single pool but this could be generate errors on import process also for others pools ?
I will report some others informations I hope soon. Thanks again.
create a zpool to an usb disk from previous version or from same arch or from amd64 and then try to import it after store some data
Done, pool from 0.6.5.x with a mix of filesystems and ZVOLs imports without issues (mirror on USB flash drives):
root@bananapi:~# zpool get all | awk '/feature@/'
fish feature@async_destroy enabled local
fish feature@empty_bpobj active local
fish feature@lz4_compress active local
fish feature@multi_vdev_crash_dump disabled local
fish feature@spacemap_histogram active local
fish feature@enabled_txg active local
fish feature@hole_birth active local
fish feature@extensible_dataset enabled local
fish feature@embedded_data active local
fish feature@bookmarks enabled local
fish feature@filesystem_limits enabled local
fish feature@large_blocks enabled local
fish feature@large_dnode disabled local
fish feature@sha512 disabled local
fish feature@skein disabled local
fish feature@edonr disabled local
fish feature@userobj_accounting disabled local
root@bananapi:~# zpool status
pool: fish
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: resilvered 4.50K in 0h0m with 0 errors on Sun Feb 18 12:13:08 2018
config:
NAME STATE READ WRITE CKSUM
fish ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sda ONLINE 0 0 0
errors: No known data errors
root@bananapi:~# cat /sys/module/zfs/version
0.7.6-1
root@bananapi:~#
try to execute script that I wrote for issue #6981
Done, on a pool with a mix of raidz and mirror VDEVs; again, no issues here:
root@bananapi:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
data 1.55G 13.8G 32.9K /zpool
data/test 1.55G 13.8G 32.9K /zpool/test
data/test/test 1.55G 13.8G 1.55G /zpool/test/test
root@bananapi:~# zpool status
pool: data
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sda1 ONLINE 0 0 0
sdb1 ONLINE 0 0 0
sda2 ONLINE 0 0 0
sdb2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
sda3 ONLINE 0 0 0
sdb3 ONLINE 0 0 0
errors: No known data errors
root@bananapi:~#
FWIS problem it seems related with vdev... can you try to create a vdev with a fix dimension ?
I did not understand this question, sorry.
@geaaru are you able to reproduce this issue on a newly created pool? Does this reproduce on file VDEVs? If it does, please create a sample pool with test files, verify it does indeed produce this issue and then upload it somewhere so we can debug on the same data. Thanks!
A question... from dmesg there is this message:
Large kmem_alloc(65552, 0x1000), please file an issue at:
Could be related with some options to pass on load ZFS kernel module to limit on 32bit with few memory ?
I confirm that if I create a new pool all works fine. Issue is related with use of existing pool.
# sh /test-zfs2.sh
cannot import 'test': no such pool available
cannot open 'test': no such pool
Creating file bigfile0...
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 968.241 s, 8.5 MB/s
Creating file bigfile1...
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 990.878 s, 8.3 MB/s
Creating file bigfile2...
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 1009.61 s, 8.1 MB/s
# zpool status
pool: test
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
/tmp/disk/test-zfs/1.img ONLINE 0 0 0
/tmp/disk/test-zfs/2.img ONLINE 0 0 0
/tmp/disk/test-zfs/3.img ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
/tmp/disk/test-zfs/7.img ONLINE 0 0 0
/tmp/disk/test-zfs/8.img ONLINE 0 0 0
/tmp/disk/test-zfs/9.img ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
/tmp/disk/test-zfs/4.img ONLINE 0 0 0
/tmp/disk/test-zfs/5.img ONLINE 0 0 0
/tmp/disk/test-zfs/6.img ONLINE 0 0 0
errors: No known data errors
I see this after reboot and a retry to import test pool:
[12327.728391] Unable to handle kernel NULL pointer dereference at virtual address 00000005
[12327.736637] pgd = c0004000
[12327.739434] [00000005] *pgd=00000000
[12327.743034] Internal error: Oops: 5 [#1] SMP ARM
[12327.747647] Modules linked in: ehci_platform ehci_hcd zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) usb_storage evdev axp20x_pek sun4i_ts nvmem_sunxi_sid sun4i_ss phy_sun4i_usb sch_fq_codel nfsd openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack axp20x_usb_power axp20x_i2c axp20x axp20x_regulator [last unloaded: ehci_hcd]
[12327.784823] CPU: 0 PID: 385 Comm: zvol Tainted: P O 4.9.81-geaaru #1
[12327.792298] Hardware name: Allwinner sun7i (A20) Family
[12327.797520] task: dc254200 task.stack: dba86000
[12327.802078] PC is at uiomove+0x1b4/0x27c [zcommon]
[12327.807460] LR is at dmu_read_uio_dnode+0xbc/0x100 [zfs]
[12327.812772] pc : [<bf1556a4>] lr : [<bf2be954>] psr: 200f0113
sp : dba87e10 ip : da6a8000 fp : dd659f7c
[12327.824233] r10: 00000001 r9 : 00000000 r8 : 00001000
[12327.829452] r7 : da6a8000 r6 : 00000000 r5 : dba87e90 r4 : 00001000
[12327.835971] r3 : 00001000 r2 : 00000000 r1 : 00001000 r0 : da6a8000
[12327.842492] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[12327.849618] Control: 10c5387d Table: 59a0806a DAC: 00000051
[12327.855357] Process zvol (pid: 385, stack limit = 0xdba86210)
[12327.861097] Stack: (0xdba87e10 to 0xdba88000)
[12327.865451] 7e00: da6a8000 00001000 00000000 00001000
[12327.873620] 7e20: 00000000 dba87e90 00000000 00001000 00000000 00000000 dd659f7c bf2be954
[12327.881790] 7e40: 00001000 00000000 00000001 bf36d8ad dba87e6c dba87e68 00000000 dc253600
[12327.889959] 7e60: 00000000 00000000 da284400 00000001 da40bb40 da36c200 40000000 00000001
[12327.898129] 7e80: da284dc0 001259f9 bf135830 bf36668c 00000001 00000000 00000000 00000000
[12327.906299] 7ea0: 00000003 dba86000 ffffffff 7fffffff 00001000 00000000 dd659f00 db80bb40
[12327.914468] 7ec0: dbd59400 00000000 dba86000 600f0113 bf135830 bf12ce5c 00000002 ffffffff
[12327.922638] 7ee0: ffffffff 00000001 dc254200 c013fd98 00000100 00000200 00000000 00000000
[12327.930808] 7f00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[12327.938977] 7f20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[12327.947148] 7f40: bf12cbb4 db80bb80 00000000 db80bb40 bf12cbb4 00000000 00000000 00000000
[12327.955317] 7f60: 00000000 c0138550 00000000 00000000 00000000 db80bb40 00000000 00000000
[12327.963488] 7f80: dba87f80 dba87f80 00000000 00000000 dba87f90 dba87f90 dba87fac db80bb80
[12327.971657] 7fa0: c0138458 00000000 00000000 c0106d90 00000000 00000000 00000000 00000000
[12327.979825] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[12327.987993] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[12327.996610] [<bf1556a4>] (uiomove [zcommon]) from [<bf2be954>] (dmu_read_uio_dnode+0xbc/0x100 [zfs])
[12328.006417] [<bf2be954>] (dmu_read_uio_dnode [zfs]) from [<bf36668c>] (zvol_read+0xec/0x194 [zfs])
[12328.015747] [<bf36668c>] (zvol_read [zfs]) from [<bf12ce5c>] (taskq_thread+0x2a8/0x3f8 [spl])
[12328.024307] [<bf12ce5c>] (taskq_thread [spl]) from [<c0138550>] (kthread+0xf8/0x10c)
[12328.032051] [<c0138550>] (kthread) from [<c0106d90>] (ret_from_fork+0x14/0x24)
[12328.039272] Code: eaffffa2 e5953020 e3530000 0affffa1 (e59a4004)
[12328.045698] ---[ end trace ca699a3584b2dd60 ]---
And when I try to export pool from disk on power-safe mode command it remains stuck (also if I try to do fdisk -l). It seems that there is some cases where there is a semaphore locked for ever.
@geaaru are you still having this problem, and if so, can you confirm it reproduces on 0.7.8?
@rincebrain Sorry for so long delay, few times for my arm device. I will said you if I have same problem with zfs 0.7.8 and 0.7.9
@rincebrain I confirm that with kernel 4.9.109 and zfs 0.7.9 all works fine.
Thanks a lot at all for support.
Hi,
i'm not sure if this problem is related with #6981 but now with 0.7.6 some pools describe on previous issue are all not mountable.
On upgrade 0.7.6 I only do only another change: upgrade to kernel 4.9.77.
System information
Describe the problem you're observing
On upgrade to 0.7.6 I can't mount all my zfs pools:
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs