Closed rottegift closed 10 years ago
Bah, I'm seeing this a lot.
The cache vdevs are found at manual import but no l2arc data is put into them even under use.
$ zpool import -N ssdpool
$ zpool iostat -v
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
ssdpool 171G 86.9G 35 52 94.1K 180K
mirror 171G 86.9G 35 52 94.1K 180K
disk16 - - 26 32 194K 340K
disk9 - - 23 33 156K 340K
logs - - - - - -
mirror 168K 936M 0 0 0 0
disk17 - - 7 1 81.0K 54.3K
disk6 - - 7 1 81.0K 54.3K
cache - - - - - -
disk32s5 0 16.2G 0 0 8.11K 1.43K
disk33s5 0 16.2G 0 0 8.12K 1.43K
---------- ----- ----- ----- ----- ----- -----
$ zpool status -v
pool: ssdpool
state: ONLINE
scan: scrub canceled on Sun Jun 15 12:02:41 2014
config:
NAME STATE READ WRITE CKSUM
ssdpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
disk16 ONLINE 0 0 0
disk9 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
disk17 ONLINE 0 0 0
disk6 ONLINE 0 0 0
cache
disk32s5 ONLINE 0 0 0
disk33s5 ONLINE 0 0 0
errors: No known data errors
The ONLINE cache vdevs that do not get use seem to occur when the pool is not cleanly exported and when there exists an /etc/zfs/zpool.cache at all.
This does not seem to happen when there is no zpool.cache. It also does not happen always, especially after clean exports.
Normally one would see a few hundred KiB or more in at least one of the cache vdevs even with zpool import -N pool.
[master, tip of tree today]
Panic on export, tip of tree today:
Anonymous UUID: EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914
Tue Jun 17 11:19:05 2014
panic(cpu 2 caller 0xffffff8021cdbf5e): Kernel trap at 0xffffff7fa2338f92, type 13=general protection, registers:
CR0: 0x000000008001003b, CR2: 0x000000010b12b000, CR3: 0x00000000241d0000, CR4: 0x00000000001606e0
RAX: 0x0000000000000000, RBX: 0xffffff8021c36e50, RCX: 0x65686e69246e6f69, RDX: 0xffffff8213ce2448
RSP: 0xffffff821119bb60, RBP: 0xffffff821119bbc0, RSI: 0x0000000000000001, RDI: 0xffffff7fa23c0d88
R8: 0xffffff822cba0035, R9: 0x0000000000000035, R10: 0x0000000000000037, R11: 0xffffff821119bb00
R12: 0xffffff804e549280, R13: 0xffffff804cf62140, R14: 0x0000000000000000, R15: 0xffffff804cf62140
RFL: 0x0000000000010246, RIP: 0xffffff7fa2338f92, CS: 0x0000000000000008, SS: 0x0000000000000010
Fault CR2: 0x000000010b12b000, Error code: 0x0000000000000000, Fault CPU: 0x2
Backtrace (CPU 2), Frame : Return Address
0xffffff8202075df0 : 0xffffff8021c22fa9 mach_kernel : _panic + 0xc9
0xffffff8202075e70 : 0xffffff8021cdbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff8202076040 : 0xffffff8021cf3456 mach_kernel : _return_from_trap + 0xe6
0xffffff8202076060 : 0xffffff7fa2338f92 net.lundman.zfs : _avl_remove + 0xa2
0xffffff821119bbc0 : 0xffffff7fa22b6285 net.lundman.zfs : _unique_remove + 0x65
0xffffff821119bc00 : 0xffffff7fa2266a65 net.lundman.zfs : _dsl_dataset_evict + 0x25
0xffffff821119bc30 : 0xffffff7fa223da10 net.lundman.zfs : _dbuf_evict_user + 0x80
0xffffff821119bc50 : 0xffffff7fa223f0a6 net.lundman.zfs : _dbuf_rele_and_unlock + 0xf6
0xffffff821119bc90 : 0xffffff7fa223c250 net.lundman.zfs : _dbuf_rele + 0x30
0xffffff821119bcb0 : 0xffffff7fa224a5f8 net.lundman.zfs : _dmu_buf_rele + 0x28
0xffffff821119bce0 : 0xffffff7fa2266b91 net.lundman.zfs : _dsl_dataset_rele + 0x21
0xffffff821119bd00 : 0xffffff7fa2280b2a net.lundman.zfs : _dsl_scan_visitds + 0x49a
0xffffff821119bd70 : 0xffffff7fa2280031 net.lundman.zfs : _dsl_scan_visit + 0x461
0xffffff821119bdd0 : 0xffffff7fa227f1f2 net.lundman.zfs : _dsl_scan_sync + 0x5d2
0xffffff821119be30 : 0xffffff7fa22a4192 net.lundman.zfs : _spa_sync + 0x4b2
0xffffff821119bee0 : 0xffffff7fa22b4786 net.lundman.zfs : _txg_sync_thread + 0x3e6
0xffffff821119bfb0 : 0xffffff8021cd7127 mach_kernel : _call_continuation + 0x17
Kernel Extensions in backtrace:
net.lundman.zfs(1.0)[10E9E953-EA4B-3364-9581-F5E75E2DF966]@0xffffff7fa222b000->0xffffff7fa2437fff
dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7fa21ec000
dependency: net.lundman.spl(1.0.0)[63543165-E11D-384D-8456-F485EA93FC7E]@0xffffff7fa221a000
I'd be curious if any of these issues you've been reporting resolve if you get rid of the cache devices.
Perhaps, but I would also get rid of pool performance. The amount of IOPS soaked up by the l2arc becomes large.
Well, yes, production and debugging aren't the same.
Ok, that was your point, and OK, I'll play for a bit without the cache vdevs.
Snag 0: zpool remove of the cache devs has to wait on a resilver; one of the ungraceful reboots apparently was especially and unusually ungraceful. I can just remove the physical devices themselves easily enough, of course.
Snag 1 - disk32s2 is in Donkey (imported) and "zpool import" says Trinity hates that.
[master, zpool.cache existed at zfs subsystem start]
$ zpool status -v Donkey
pool: Donkey
state: ONLINE
scan: resilvered 4.33M in 0h0m with 0 errors on Tue Jun 17 09:43:50 2014
config:
NAME STATE READ WRITE CKSUM
Donkey ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
disk32s2 ONLINE 0 0 0
disk33s2 ONLINE 0 0 0
disk31s2 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
disk9s1 ONLINE 0 0 0
disk15s1 ONLINE 0 0 0
errors: No known data errors
pool: Trinity
id: 5933279091430968458
state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
see: http://zfsonlinux.org/msg/ZFS-8000-2Q
config:
Trinity DEGRADED
mirror-0 ONLINE
disk23 ONLINE
disk25s2 ONLINE
mirror-1 ONLINE
disk20s2 ONLINE
disk24s2 ONLINE
mirror-2 DEGRADED
disk22s2 ONLINE
disk32s2 UNAVAIL corrupted data
mirror-3 ONLINE
disk21s2 ONLINE
disk19s2 ONLINE
cache
disk28s4
disk27s4
logs
mirror-4 ONLINE
disk14s1 ONLINE
disk6s1 ONLINE
("zpool import" when there is no zpool.cache at startup shows no conflicts, and importing the pools DTRT.)
@rottegift So no panic now since removing the cache vdevs?
Nope. I have done a fair number of exports (which triggered them before) while dealing with the other matter, but have not seen the panics above.
vdev-iokit seems to be immune to both these problems, fyi.
closing to focus on the cache vdev problem
I noticed that the cache vdevs were ONLINE fine but nothing was being stored in them, so I exported two pools (one of which had only the root dataset mounted, one of which had been imported with -N and had nothing mounted). Result: