openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
824 stars 72 forks source link

master panics #201

Closed rottegift closed 10 years ago

rottegift commented 10 years ago

zfs mount -a did this:

Anonymous UUID:       EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914

Mon Jun 23 12:48:21 2014
panic(cpu 1 caller 0xffffff80272dbf5e): Kernel trap at 0xffffff7fa8f0d9d1, type 14=page fault, registers:
CR0: 0x0000000080010033, CR2: 0x00000000008a0000, CR3: 0x0000000014587149, CR4: 0x00000000001606e0
RAX: 0x00000000008a0000, RBX: 0xffffff8218ba3f10, RCX: 0x0000000000000000, RDX: 0xffffff82186b98b8
RSP: 0xffffff8218ba3840, RBP: 0xffffff8218ba3840, RSI: 0xffffff82186b98b8, RDI: 0xffffff82186b98b8
R8:  0xffffff8218ba38c8, R9:  0xffffff8218ba3890, R10: 0xffffff82151a51d0, R11: 0xffffffffffffff00
R12: 0x0000000000100003, R13: 0xffffff8060f9e4b0, R14: 0xffffff8218ba3e70, R15: 0x0000000000000048
RFL: 0x0000000000010286, RIP: 0xffffff7fa8f0d9d1, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x00000000008a0000, Error code: 0x0000000000000002, Fault CPU: 0x1

Backtrace (CPU 1), Frame : Return Address
0xffffff8218ba34d0 : 0xffffff8027222fa9 mach_kernel : _panic + 0xc9
0xffffff8218ba3550 : 0xffffff80272dbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff8218ba3720 : 0xffffff80272f3456 mach_kernel : _return_from_trap + 0xe6
0xffffff8218ba3740 : 0xffffff7fa8f0d9d1 net.lundman.spl : _list_insert_tail + 0x21
0xffffff8218ba3840 : 0xffffff7fa8f66e8a net.lundman.zfs : _dsl_dir_tempreserve_impl + 0x38a
0xffffff8218ba3920 : 0xffffff7fa8f66abd net.lundman.zfs : _dsl_dir_tempreserve_space + 0x17d
0xffffff8218ba39a0 : 0xffffff7fa8f4d027 net.lundman.zfs : _dmu_tx_try_assign + 0x447
0xffffff8218ba3a50 : 0xffffff7fa8f4cb6b net.lundman.zfs : _dmu_tx_assign + 0x3b
0xffffff8218ba3a80 : 0xffffff7fa8ff5d8b net.lundman.zfs : _zfs_write + 0x9eb
0xffffff8218ba3d00 : 0xffffff7fa9004ffe net.lundman.zfs : _zfs_vnop_write + 0xae
0xffffff8218ba3d60 : 0xffffff80273fde51 mach_kernel : _VNOP_WRITE + 0xe1
0xffffff8218ba3de0 : 0xffffff80273f3be3 mach_kernel : _utf8_normalizestr + 0x703
0xffffff8218ba3e50 : 0xffffff80275f22f1 mach_kernel : _write_nocancel + 0x1b1
0xffffff8218ba3ef0 : 0xffffff80275f21dd mach_kernel : _write_nocancel + 0x9d
0xffffff8218ba3f50 : 0xffffff8027640653 mach_kernel : _unix_syscall64 + 0x1f3
0xffffff8218ba3fb0 : 0xffffff80272f3c56 mach_kernel : _hndl_unix_scall64 + 0x16
      Kernel Extensions in backtrace:
         net.lundman.spl(1.0)[626E72AC-DE97-3631-ADC2-0849D468CF64]@0xffffff7fa8f0b000->0xffffff7fa8f1bfff
         net.lundman.zfs(1.0)[AEAAAB93-CD2F-3028-8BFD-D0F399FAA6BE]@0xffffff7fa8f1c000->0xffffff7fa9128fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7fa77f3000
            dependency: net.lundman.spl(1.0.0)[626E72AC-DE97-3631-ADC2-0849D468CF64]@0xffffff7fa8f0b000

BSD process name corresponding to current thread: squid
Boot args: -v keepsyms=y darkwake=0 -s
rottegift commented 10 years ago

And previously exporting did this:

Anonymous UUID:       EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914

Mon Jun 23 12:46:18 2014
panic(cpu 4 caller 0xffffff80086dbf5e): Kernel trap at 0xffffff7f8a42f624, type 13=general protection, registers:
CR0: 0x0000000080010033, CR2: 0x00007fff5ca8c828, CR3: 0x00000001070e805b, CR4: 0x00000000001606e0
RAX: 0x74697265686e6924, RBX: 0xffffff82134bbd58, RCX: 0xffffff7f8a4b48c0, RDX: 0x0000000000000001
RSP: 0xffffff82134bb6b0, RBP: 0xffffff82134bb6e0, RSI: 0xffffff82134bb878, RDI: 0xffffff826bfd7b28
R8:  0x0000000000000074, R9:  0x0000000000000220, R10: 0xffffff82134bb878, R11: 0xffffff82134bb500
R12: 0x0000000012000530, R13: 0xffffff803aa7a000, R14: 0x0000000012000530, R15: 0xffffff802d9a8a80
RFL: 0x0000000000010206, RIP: 0xffffff7f8a42f624, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x00007fff5ca8c828, Error code: 0x0000000000000000, Fault CPU: 0x4

Backtrace (CPU 4), Frame : Return Address
0xffffff81e8875df0 : 0xffffff8008622fa9 mach_kernel : _panic + 0xc9
0xffffff81e8875e70 : 0xffffff80086dbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff81e8876040 : 0xffffff80086f3456 mach_kernel : _return_from_trap + 0xe6
0xffffff81e8876060 : 0xffffff7f8a42f624 net.lundman.zfs : _nvlist_exists + 0x54
0xffffff82134bb6e0 : 0xffffff7f8a36e20b net.lundman.zfs : _dsl_prop_get_all_impl + 0x36b
0xffffff82134bb9e0 : 0xffffff7f8a36dacc net.lundman.zfs : _dsl_prop_get_all_ds + 0x19c
0xffffff82134bbb40 : 0xffffff7f8a36d925 net.lundman.zfs : _dsl_prop_get_all + 0x25
0xffffff82134bbb60 : 0xffffff7f8a3e4e4d net.lundman.zfs : _zfs_ioc_objset_stats_impl + 0x4d
0xffffff82134bbba0 : 0xffffff7f8a3e1c7b net.lundman.zfs : _zfs_ioc_snapshot_list_next + 0x1ab
0xffffff82134bbc20 : 0xffffff7f8a3dd7a4 net.lundman.zfs : _zfsdev_ioctl + 0x664
0xffffff82134bbcf0 : 0xffffff800880d63f mach_kernel : _spec_ioctl + 0x11f
0xffffff82134bbd40 : 0xffffff80087fe000 mach_kernel : _VNOP_IOCTL + 0x150
0xffffff82134bbdc0 : 0xffffff80087f3e51 mach_kernel : _utf8_normalizestr + 0x971
0xffffff82134bbe10 : 0xffffff80089c1303 mach_kernel : _fo_ioctl + 0x43
0xffffff82134bbe40 : 0xffffff80089f2c66 mach_kernel : _ioctl + 0x466
0xffffff82134bbf50 : 0xffffff8008a40653 mach_kernel : _unix_syscall64 + 0x1f3
0xffffff82134bbfb0 : 0xffffff80086f3c56 mach_kernel : _hndl_unix_scall64 + 0x16
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.0)[AEAAAB93-CD2F-3028-8BFD-D0F399FAA6BE]@0xffffff7f8a31c000->0xffffff7f8a528fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7f88bf3000
            dependency: net.lundman.spl(1.0.0)[626E72AC-DE97-3631-ADC2-0849D468CF64]@0xffffff7f8a30b000

BSD process name corresponding to current thread: zpool
Boot args: -v keepsyms=y darkwake=0
lundman commented 10 years ago

The first panic, we have:

    tr_list = kmem_alloc(sizeof (list_t), KM_PUSHPAGE);
    list_create(tr_list, sizeof (struct tempreserve),
        offsetof(struct tempreserve, tr_node));

which allocates with PUSHPAGE, but even if we did return NULL here, we would panic in list_create.

Then we call;

dsl_dir_tempreserve_impl(dsl_dir_t *dd, uint64_t asize, boolean_t netfree,
    boolean_t ignorequota, boolean_t checkrefquota, list_t *tr_list,
    dmu_tx_t *tx, boolean_t first)

which does;

    tr = kmem_zalloc(sizeof (struct tempreserve), KM_PUSHPAGE);
    tr->tr_ds = dd;
    tr->tr_size = asize;
    list_insert_tail(tr_list, tr);

again, PUSHPAGE, and even if NULL, assigning tr would panic, not the call to list_insert_tail. This would then indicate memory corruption, since both tr and tr_list here are most likely good. No thread issues, since it is allocated for this thread and only used here once.

We have seen a few panics in the vnop_write path though, could be an issue there?

lundman commented 10 years ago

Second panic

    nvlist_t *nv;
        (error = dsl_prop_get_all(os, &nv)) == 0) {
dsl_prop_get_all_ds(dsl_dataset_t *ds, nvlist_t **nvp,
    dsl_prop_getflags_t flags)
{
...
    VERIFY(nvlist_alloc(nvp, NV_UNIQUE_NAME, KM_SLEEP) == 0);
...
        err = dsl_prop_get_all_impl(mos, ds->ds_phys->ds_props_obj,
            setpoint, flags, *nvp);
dsl_prop_get_all_impl(objset_t *mos, uint64_t propobj,
    const char *setpoint, dsl_prop_getflags_t flags, nvlist_t *nv)
{
...
        /* Skip properties already defined. */
        if (nvlist_exists(nv, propname))
            continue;

So, it definitely initialises nvp (or VERIFY panics). But we have no way to know the status of propname. It is not used prior in a way that would definitely panic.

lundman commented 10 years ago

Actually, do you think I could convince you to run the kexts from todays DMG?

http://lundman.net/ftp/osx.zfs/osx.zfs-signed-20140627.dmg

Nothing special in it, but rules out the compile environment and related bits.

rottegift commented 10 years ago

Well the most recent master just took out half of two mirrored vdevs in two pools and resilvering will take a lonnnnnnnnnnnnng time, so I'm a bit keen to avoid having another vdev taken out. Mere panics have never done that.

rottegift commented 10 years ago

(In fact the bigger shock was a pool that never imports automatically (it's deliberately never in zpool.cache, ever) managed to get imported, and that is one of the pools resilvering.)

lundman commented 10 years ago

Yeah, I think https://github.com/openzfsonosx/zfs/commit/95ff8054f02b464c2d077af760aee0ffbd28ab5c changed the kext so we don't load cache file, but also changed zfs.util to "list pools" + "import pools". You don't want the new zfs.util to run, until we can make it better. Just the two kexts from dmg is what I mean, no zed, no zfs.util.

But I can understand the hesitation. We will have to think about what to do about pools that users dont want imported.

rottegift commented 10 years ago

I can do that in the alternate boot environment I have on the same machine, but I'll wait to see how long the resilvering takes.

"We will have to think about what to do about pools that users dont want imported."

Please also think about not astonishing people who work with other zfs implementations, none of which import pools automatically unless they're in zpool.cache. (e.g., think of cachefile pool property)

lundman commented 10 years ago

Yes, that is the discussion. One, the cache file controls what imports on boot.

But also on OSX after boot, if you plug in a device with ZFS on it, the OSX users expect it to be imported and mounted. We have to follow this as well, even if it is not done on other platforms (by default).

rottegift commented 10 years ago

Ok, I now have two ways I think I can provoke two different panics (one a double panic that leaves nothing sensible in the crash dump, one that seems to), so I'm now in an alternate boot environment that has never had zfs installed on it.

download and mount today's dog, run load_zfs.sh, zpool import: look ok. i'll carry on with tests and update here later.

rottegift commented 10 years ago

PS: If you're going to depart wholeheartedly from having a zpool.cache, that's fine, but the documentation job will be important. If you're going to do a small departure, say, automatically import ONLY (!!!) single-device pools, that's fine too. However, I personally would prefer to be as close to openzfs as possible.

Another option would be an o3x-specific pool feature that would allow automatic import if active. If it's important that a pool with the feature active be imported on another zfs implementation, sell the idea to openzfs developers.

rottegift commented 10 years ago

Hmm... while not conclusive, I haven't gotten it to crash yet.

rottegift commented 10 years ago

Ah, but the pools' cache vdevs are ONLINE but are not actually being written to.

lundman commented 10 years ago

Having kext reading cachefile seems to be problematic on OSX since the device names/orders change. Neither master nor vdev-iokit handle that rename yet.

One thought is to take the cachefile reading out of kext (no auto import there) and let zfs.util handle it. Just adding some code to zfs.util to read the cachefile, and import only those in there.

rottegift commented 10 years ago

YAY! YAY! A panic!

Anonymous UUID:       2A7EBDDF-80E2-8419-C1B5-5A9410E11B67

Fri Jun 27 02:04:33 2014
panic(cpu 2 caller 0xffffff802f6dbf5e): Kernel trap at 0xffffff7fb15fe300, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000170, CR3: 0x0000000031bb4000, CR4: 0x00000000001606e0
RAX: 0xffffff823651bc40, RBX: 0xffffff823039a6d8, RCX: 0x0000000000000108, RDX: 0x0000000000000108
RSP: 0xffffff82300b39c0, RBP: 0xffffff82300b39f0, RSI: 0x0000000000000001, RDI: 0x0000000000000000
R8:  0x00000000ffff0000, R9:  0x00000000ffff0000, R10: 0xffffff82300b3cc0, R11: 0xffffffffffffffff
R12: 0xffffff823039a718, R13: 0x000000000000a7ba, R14: 0xffffff823039a6f8, R15: 0xffffff8239e30538
RFL: 0x0000000000010202, RIP: 0xffffff7fb15fe300, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000000000170, Error code: 0x0000000000000000, Fault CPU: 0x2

Backtrace (CPU 2), Frame : Return Address
0xffffff82300b3650 : 0xffffff802f622fa9 mach_kernel : _panic + 0xc9
0xffffff82300b36d0 : 0xffffff802f6dbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff82300b38a0 : 0xffffff802f6f3456 mach_kernel : _return_from_trap + 0xe6
0xffffff82300b38c0 : 0xffffff7fb15fe300 net.lundman.zfs : _vdev_dtl_contains + 0x40
0xffffff82300b39f0 : 0xffffff7fb1606fb9 net.lundman.zfs : _vdev_mirror_io_done + 0x329
0xffffff82300b3a90 : 0xffffff7fb16715a9 net.lundman.zfs : _zio_vdev_io_done + 0x1a9
0xffffff82300b3ad0 : 0xffffff7fb166df4a net.lundman.zfs : ___zio_execute + 0x12a
0xffffff82300b3b10 : 0xffffff7fb1672b92 net.lundman.zfs : _zio_done + 0x11d2
0xffffff82300b3c30 : 0xffffff7fb166df4a net.lundman.zfs : ___zio_execute + 0x12a
0xffffff82300b3c70 : 0xffffff7fb1672b92 net.lundman.zfs : _zio_done + 0x11d2
0xffffff82300b3d90 : 0xffffff7fb166df4a net.lundman.zfs : ___zio_execute + 0x12a
0xffffff82300b3dd0 : 0xffffff7fb1672b92 net.lundman.zfs : _zio_done + 0x11d2
0xffffff82300b3ef0 : 0xffffff7fb166df4a net.lundman.zfs : ___zio_execute + 0x12a
0xffffff82300b3f30 : 0xffffff7fb166de15 net.lundman.zfs : _zio_execute + 0x15
0xffffff82300b3f50 : 0xffffff7fb1562217 net.lundman.spl : _taskq_thread + 0xc7
0xffffff82300b3fb0 : 0xffffff802f6d7127 mach_kernel : _call_continuation + 0x17
      Kernel Extensions in backtrace:
         net.lundman.spl(1.0)[43937DBE-7CC1-36E2-BFF0-AA5430ABBC60]@0xffffff7fb155e000->0xffffff7fb156efff
         net.lundman.zfs(1.0)[170185FD-89A4-3CB0-85E1-E3807CCEDBEC]@0xffffff7fb156f000->0xffffff7fb177bfff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7fafe9b000
            dependency: net.lundman.spl(1.0.0)[43937DBE-7CC1-36E2-BFF0-AA5430ABBC60]@0xffffff7fb155e000

BSD process name corresponding to current thread: kernel_task
Boot args: -v keepsyms=y
rottegift commented 10 years ago

vdev-iokit has a big operational advantage at the moment in that shortly after boot it reliably imports pools with many vdevs and leaves them all (including the cache vdevs) working correctly, every time. master practically always has issues shortly after boot: FAULTED pools, cache vdevs not working, etc.

rottegift commented 10 years ago

At the time of the crash above, two pools were resilvering, one of those was the source of some zfs send traffic. A third pool was being beaten up by diskutil verifyvolume concurrently on two zvols. And a fourth was doing minor I/O. IIRC, I think I was doing about 3k IOPS.

lundman commented 10 years ago

So you are saying that vdev_iokit handles the device renaming and autoimport on boot just fine?

rottegift commented 10 years ago

Yes, that is what I am saying (about vdev-iokit). Given the number of times I've restarted lately, it is a huge advantage to vdev-iokit.

Oh, back and loaded the kexts again. (no cachefile exists to be clear). At import time the cache vdevs again are ONLINE but are not used. Ordinarily (and again, always in vdev-iokit) there would be a couple meg written to them as fast as you could run zpool iostat -v upon import.

rottegift commented 10 years ago

Yes! Another panic.

This was exporting CLATM after its resilver had finished. Trinity was naturally still resilvering and was also the source of a zfs send.

Anonymous UUID:       2A7EBDDF-80E2-8419-C1B5-5A9410E11B67

Fri Jun 27 02:36:37 2014
panic(cpu 4 caller 0xffffff800a2dbf5e): Kernel trap at 0xffffff7f8c282614, type 13=general protection, registers:
CR0: 0x000000008001003b, CR2: 0x00007fbd79d3a000, CR3: 0x00000000339eb02c, CR4: 0x00000000001606e0
RAX: 0x74697265686e6924, RBX: 0xffffff8241d2bd58, RCX: 0xffffff7f8c3078c0, RDX: 0x0000000000000001
RSP: 0xffffff8241d2b6b0, RBP: 0xffffff8241d2b6e0, RSI: 0xffffff8241d2b878, RDI: 0xffffff81fd81cf88
R8:  0x0000000000000074, R9:  0x0000000000000220, R10: 0xffffff8241d2b878, R11: 0xffffff8241d2b500
R12: 0x000000002200011d, R13: 0xffffff803aba44b0, R14: 0x000000002200011d, R15: 0xffffff802beb3540
RFL: 0x0000000000010206, RIP: 0xffffff7f8c282614, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x00007fbd79d3a000, Error code: 0x0000000000000000, Fault CPU: 0x4

Backtrace (CPU 4), Frame : Return Address
0xffffff81ea80ddf0 : 0xffffff800a222fa9 mach_kernel : _panic + 0xc9
0xffffff81ea80de70 : 0xffffff800a2dbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff81ea80e040 : 0xffffff800a2f3456 mach_kernel : _return_from_trap + 0xe6
0xffffff81ea80e060 : 0xffffff7f8c282614 net.lundman.zfs : _nvlist_exists + 0x54
0xffffff8241d2b6e0 : 0xffffff7f8c1c110b net.lundman.zfs : _dsl_prop_get_all_impl + 0x36b
0xffffff8241d2b9e0 : 0xffffff7f8c1c09cc net.lundman.zfs : _dsl_prop_get_all_ds + 0x19c
0xffffff8241d2bb40 : 0xffffff7f8c1c0825 net.lundman.zfs : _dsl_prop_get_all + 0x25
0xffffff8241d2bb60 : 0xffffff7f8c237e2d net.lundman.zfs : _zfs_ioc_objset_stats_impl + 0x4d
0xffffff8241d2bba0 : 0xffffff7f8c234c5b net.lundman.zfs : _zfs_ioc_snapshot_list_next + 0x1ab
0xffffff8241d2bc20 : 0xffffff7f8c230784 net.lundman.zfs : _zfsdev_ioctl + 0x664
0xffffff8241d2bcf0 : 0xffffff800a40d63f mach_kernel : _spec_ioctl + 0x11f
0xffffff8241d2bd40 : 0xffffff800a3fe000 mach_kernel : _VNOP_IOCTL + 0x150
0xffffff8241d2bdc0 : 0xffffff800a3f3e51 mach_kernel : _utf8_normalizestr + 0x971
0xffffff8241d2be10 : 0xffffff800a5c1303 mach_kernel : _fo_ioctl + 0x43
0xffffff8241d2be40 : 0xffffff800a5f2c66 mach_kernel : _ioctl + 0x466
0xffffff8241d2bf50 : 0xffffff800a640653 mach_kernel : _unix_syscall64 + 0x1f3
0xffffff8241d2bfb0 : 0xffffff800a2f3c56 mach_kernel : _hndl_unix_scall64 + 0x16
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.0)[170185FD-89A4-3CB0-85E1-E3807CCEDBEC]@0xffffff7f8c16f000->0xffffff7f8c37bfff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7f8aa9b000
            dependency: net.lundman.spl(1.0.0)[43937DBE-7CC1-36E2-BFF0-AA5430ABBC60]@0xffffff7f8c15e000

BSD process name corresponding to current thread: real_zfs
Boot args: -v keepsyms=y
rottegift commented 10 years ago

This time, "zpool import -N Trinity" (after loading the ftp kexts as above) results in cache vdevs not only being ONLINE but also seeing traffic

cache                          -      -      -      -      -      -
  disk32s4                  942K  17.0G      0      0  1.19K  8.48K
  disk31s4                 2.87M  17.0G      0      0  1.19K  26.0K
lundman commented 10 years ago

Certainly seeing that zfs_ioc_snapshot_list_next frequently enough that it might just have an issue there.

Do you have a second machine/VM that you can run lldb from on panic? or should we see if darwin still can send cores to tftp..

rottegift commented 10 years ago

If 10.8 lldb can talk to 10.9 kernel, then yes, otherwise I could do the tftp trick relatively easily (is it a boot argument?), or cough up another 10.9 box, but probably none of the above until some hours from now.

lundman commented 10 years ago

It can be picking about lldb version, I'm not sure. As long as the kernel debug dmg is exactly the same it might work. Or I can connect from here, but that is awkward with timezones and all that.

rottegift commented 10 years ago

Tricky. We'll have to come back to it when I am more functional (timezones...)

The job that sends snapshots from 52 datasets to a mother machine just terminated correctly. The main difference is that it was running while the system had very little load (just a resilver, really). Doing the same job while the machine is under heavy load (multiple resilvers/scrubs, heavy activity on zvols) tends to coincide with a panic.

Although it cannot be a factor in this boot environment, I do run zfs snapshots on several datasets about once a minute, and the last syslog entry before several BOOT_TIME entries has often been from zfs-auto-snapshot.

There are a couple of other tasks (that are independent of any zfs send/receive or snapshot activity) that trigger a panic as noted in the long vdev-iokit thread with @evansus -- I will try to get around to doing that with the ftp kexts tomorrow. He's also been trying to reproduce them, I think.

I don't know whether the panics have a single common source or whether heavy load exposes multiple problems. I'm (now) pretty confident it's not hardware or build environment related, though.

lundman commented 10 years ago

Indeed, if you happen to get a panic while we are both on (and pool can be down for ~10mins) log on to irc and maybe we can sort something out. If you are in a LAN, we can create your account on o3x.org for ssh tunnel.

Meanwhile, I am wondering if there is some way that zio structs can be corrupted and/or used-after-free and/or running-out-of, but they use straight kmem_cache_alloc, which is on top of bmem.

It is true the taskq code in SPL is almost a straight paste job from MacZFS, and should probably be inspected for heavy-load cases. But since we used to spawn ~270 threads for each pool, perhaps it is not related.

lundman commented 10 years ago

Do you remember if you still had panics when NOT doing one of the tasks? Like if not zfs send, or if not using zvol? Hmm I guess I would only suspect zvol in that list. Do panics happen with no zvols in use?

rottegift commented 10 years ago

Well I seem to be able to provoke a panic in a relatively short timeframe by deliberately firing up real traffic-generating jobs concurrently -- not exactly on demand and not an easy-to-reduce test case, but maybe easier than deciding on what IRC client to use and where I'm leaking my public IP addresses. :-)

In answer to your question above, yes, I have had panics when not doing zfs send, and when there have been no pools with zvols imported at all. Beating up zvols makes panics happen more quickly, but it's not necessary. The last panic ( https://github.com/openzfsonosx/zfs/issues/201#issuecomment-47299856 ) was zvol-free.

rottegift commented 10 years ago

Another trigger, for example, is high inbound P2P (the relevant thing is many multiple writing threads rather than one fast source) load onto a slow-to-write pool. No zvols or zfs send involved.

Opening hundreds of Safari and Chrome windows and tabs has also been an apparent trigger, mostly because of the cache activity of the browsers vs a squid cache that is on the same pool. Again that's lots of threads/processes writing a lot.

rottegift commented 10 years ago

Hm I just leaped out of bed thinking that maybe the one commonality in all the tasks is the release of a large amount of ARC memory. This may be a totally incorrect or irrelevant eureka, though...

lundman commented 10 years ago

By release, do you mean something you do, or just as part of the natural zfs landscape?

rottegift commented 10 years ago

No, I mean roughly as in this discussion between Bryan Cantrill and Adam Leventhal

http://thread.gmane.org/gmane.os.illumos.devel/14783/focus=3406

wrt the new zfs/vdev_queue.c and the setting of zfs_dirty_data* in zfs/arc.c from the declarations in zfs/dsl_pool.c

My thought was that essentially the "triggers" are building up huge amounts of dirty data that lead to a TXG sync that takes a long time (especially when synchronous reads are needed to fulfil it, for instance, when dedup is involved) and that the panics happen when the dirty data is automatically released by zfs. Is it possible that when that happens, memory corruption happens?

On an omnios r151010 box that mostly only receives backup data (zfs send, Time Machine, rsync) data and does heavy deduplication with too little RAM but huge amounts of fast cache vdev, I have in /etc/system:

set zfs:zfs_vdev_async_write_active_min_dirty_percent = 5
set zfs:zfs_dirty_data_max = 0x32000000

I've had zfs_dirty_data_max much lower in situations where I know that I will be removing heavily deduped snapshots (for example) so as to not have to wait hours for zfs and zpool commands to work while the pool freeing property is >> 0. I've also of course made use of mdb -k.

On this 16 GiB Mac, zfs_dirty_data_max_percent would be 1.6 GiB and zfs_dirty_data_max_max would be 4 GiB, both are pretty huge given that there is lots of competition from apps (browsers with many windows in particular).

I'll experiment (in the vdev-iokit branch for the moment) with adjusting those downwards, since I don't know how to dynamically adjust those downwards in Mac OS X. (Do you? There must be a way...)

rottegift commented 10 years ago

( and as gmane is not working right this exact second, it's also at http://www.listbox.com/member/archive/182179/2013/20131217235623:BC44B00C-67A0-11E3-B6F6-A9249FC3ED65/ )

lundman commented 10 years ago

The only dynamic tunable we have are those that I "ported" over, so if you want them to be dynamic, I can make that happen.

rottegift commented 10 years ago

I built and tested master just now. zfs.util does better, but reliably doesn't find half the log devices.

It also imports everything including pools with cachefile=none set, and sets cachefile=-.

zpool status -v
  pool: CLATM
 state: ONLINE
  scan: resilvered 171G in 2h46m with 0 errors on Fri Jun 27 02:31:04 2014
config:

    NAME          STATE     READ WRITE CKSUM
    CLATM         ONLINE       0     0     0
      mirror-0    ONLINE       0     0     0
        disk21s1  ONLINE       0     0     0
        disk25    ONLINE       0     0     0
    logs
      mirror-1    ONLINE       0     0     0
        disk7s1   ONLINE       0     0     0
        disk16s1  ONLINE       0     0     0
    cache
      disk33s2    ONLINE       0     0     0
      disk34s2    ONLINE       0     0     0

errors: No known data errors

  pool: Donkey
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub in progress since Mon Jun 30 03:42:42 2014
    8.32G scanned out of 1.63T at 111K/s, (scan is slow, no estimated time)
    0 repaired, 0.50% done
config:

    NAME                      STATE     READ WRITE CKSUM
    Donkey                    DEGRADED     0     0     0
      mirror-0                ONLINE       0     0     0
        disk19s2              ONLINE       0     0     0
        disk20s2              ONLINE       0     0     0
        disk27s2              ONLINE       0     0     0
    logs
      mirror-1                DEGRADED     0     0     0
        disk13s1              ONLINE       0     0     0
        17684261269340990048  UNAVAIL      0     0     0  was /dev/disk15s1
    cache
      disk32                  ONLINE       0     0     0

errors: No known data errors

  pool: Trinity
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: resilvered 58.1M in 0h23m with 0 errors on Mon Jun 30 03:59:09 2014
config:

    NAME                      STATE     READ WRITE CKSUM
    Trinity                   DEGRADED     0     0     0
      mirror-0                ONLINE       0     0     0
        disk29                ONLINE       0     0     0
        disk24s2              ONLINE       0     0     0
      mirror-1                ONLINE       0     0     0
        disk22s2              ONLINE       0     0     0
        disk23s2              ONLINE       0     0     0
      mirror-2                ONLINE       0     0     0
        disk28s2              ONLINE       0     0     0
        disk31s2              ONLINE       0     0     0
      mirror-3                ONLINE       0     0     0
        disk30s2              ONLINE       0     0     0
        disk26s2              ONLINE       0     0     0
    logs
      mirror-4                DEGRADED     0     0     0
        14764870903303591912  FAULTED      0     0     0  was /dev/disk16s1
        disk12s1              ONLINE       0     0     0
    cache
      disk31s4                UNAVAIL      0     0     0  cannot open
      disk30s4                UNAVAIL      0     0     0  cannot open

errors: No known data errors

  pool: ssdpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: resilvered 1.35M in 0h0m with 0 errors on Mon Jun 30 11:32:06 2014
config:

    NAME                     STATE     READ WRITE CKSUM
    ssdpool                  DEGRADED     0     0     0
      mirror-0               ONLINE       0     0     0
        disk8                ONLINE       0     0     0
        disk15               ONLINE       0     0     0
    logs
      mirror-1               DEGRADED     0     0     0
        disk6                ONLINE       0     0     0
        1871386058628843819  UNAVAIL      0     0     0  was /dev/disk5s1
    cache
      disk33s3               ONLINE       0     0     0
      disk34s3               ONLINE       0     0     0

errors: No known data errors

(edit: export and reimport finds the UNAVAIL log devices. the problem seems to be permissions - they are slices of SSDs where the boot partition is a softraid mirror and extra space is carved up for log vdevs among other things:

30/06/2014 12:25:55.000 kernel[0]: #### SoftRAID Driver: access violation; client = ffffff802592ee00 has no access to volume 30/06/2014 12:25:55.000 kernel[0]: #### SoftRAID Driver: access violation; client = ffffff802592ee00 has no access to volume 30/06/2014 12:25:55.000 kernel[0]: #### SoftRAID Driver: access violation; client = ffffff802592ee00 has no access to volume 30/06/2014 12:25:55.000 kernel[0]: #### SoftRAID Driver: access violation; client = ffffff802592ee00 has no access to volume )

rottegift commented 10 years ago
Anonymous UUID:       EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914

Mon Jun 30 11:37:18 2014
panic(cpu 0 caller 0xffffff802a6dbf5e): Kernel trap at 0xffffff7faad3b7c4, type 13=general protection, registers:
CR0: 0x0000000080010033, CR2: 0x0000000156f49000, CR3: 0x000000001b508017, CR4: 0x00000000001606e0
RAX: 0x0000000000000010, RBX: 0xffffff8248e23d58, RCX: 0x0000000000000038, RDX: 0xffffff825425b278
RSP: 0xffffff8248e23550, RBP: 0xffffff8248e23580, RSI: 0x7974696e6972542f, RDI: 0x7974696e6972542f
R8:  0xffffff825425b278, R9:  0x00000000fffffff9, R10: 0x0000000000000000, R11: 0xffffff8248e23500
R12: 0x000000001200038a, R13: 0xffffff8061c02b40, R14: 0x000000001200038a, R15: 0xffffff804d002950
RFL: 0x0000000000010286, RIP: 0xffffff7faad3b7c4, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000156f49000, Error code: 0x0000000000000000, Fault CPU: 0x0

Backtrace (CPU 0), Frame : Return Address
0xffffff81eaa4bc50 : 0xffffff802a622fa9 mach_kernel : _panic + 0xc9
0xffffff81eaa4bcd0 : 0xffffff802a6dbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff81eaa4bea0 : 0xffffff802a6f3456 mach_kernel : _return_from_trap + 0xe6
0xffffff81eaa4bec0 : 0xffffff7faad3b7c4 net.lundman.zfs : _nv_mem_zalloc + 0x14
0xffffff8248e23580 : 0xffffff7faad3fc94 net.lundman.zfs : _nvp_buf_alloc + 0x44
0xffffff8248e235d0 : 0xffffff7faad3c282 net.lundman.zfs : _nvlist_add_common + 0x202
0xffffff8248e236b0 : 0xffffff7faad3cc49 net.lundman.zfs : _nvlist_add_nvlist + 0x39
0xffffff8248e236e0 : 0xffffff7faac7d2dc net.lundman.zfs : _dsl_prop_get_all_impl + 0x51c
0xffffff8248e239e0 : 0xffffff7faac7c9ec net.lundman.zfs : _dsl_prop_get_all_ds + 0x19c
0xffffff8248e23b40 : 0xffffff7faac7c845 net.lundman.zfs : _dsl_prop_get_all + 0x25
0xffffff8248e23b60 : 0xffffff7faacf3e4d net.lundman.zfs : _zfs_ioc_objset_stats_impl + 0x4d
0xffffff8248e23ba0 : 0xffffff7faacf0c7b net.lundman.zfs : _zfs_ioc_snapshot_list_next + 0x1ab
0xffffff8248e23c20 : 0xffffff7faacec7a4 net.lundman.zfs : _zfsdev_ioctl + 0x664
0xffffff8248e23cf0 : 0xffffff802a80d63f mach_kernel : _spec_ioctl + 0x11f
0xffffff8248e23d40 : 0xffffff802a7fe000 mach_kernel : _VNOP_IOCTL + 0x150
0xffffff8248e23dc0 : 0xffffff802a7f3e51 mach_kernel : _utf8_normalizestr + 0x971
0xffffff8248e23e10 : 0xffffff802a9c1303 mach_kernel : _fo_ioctl + 0x43
0xffffff8248e23e40 : 0xffffff802a9f2c66 mach_kernel : _ioctl + 0x466
0xffffff8248e23f50 : 0xffffff802aa40653 mach_kernel : _unix_syscall64 + 0x1f3
0xffffff8248e23fb0 : 0xffffff802a6f3c56 mach_kernel : _hndl_unix_scall64 + 0x16
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.0)[798890EC-39BE-3E0B-8C50-F76283916246]@0xffffff7faac2b000->0xffffff7faae38fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7faabec000
            dependency: net.lundman.spl(1.0.0)[205406D0-4396-3572-B257-19B5A81B1084]@0xffffff7faac1a000

BSD process name corresponding to current thread: zpool
Boot args: -v keepsyms=y darkwake=0
rottegift commented 10 years ago
Anonymous UUID:       EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914

Mon Jun 30 11:31:53 2014
panic(cpu 4 caller 0xffffff800a8dbf5e): Kernel trap at 0xffffff7f8ae1c9d1, type 14=page fault, registers:
CR0: 0x0000000080010033, CR2: 0x00000000008a0000, CR3: 0x0000000103961019, CR4: 0x00000000001606e0
RAX: 0x00000000008a0000, RBX: 0xffffff8222483f10, RCX: 0x0000000000000000, RDX: 0xffffff81f988dbf8
RSP: 0xffffff8222483840, RBP: 0xffffff8222483840, RSI: 0xffffff81f988dbf8, RDI: 0xffffff81f988dbf8
R8:  0xffffff82224838c8, R9:  0xffffff8222483890, R10: 0xffffff81f9794440, R11: 0xffffffffffffff00
R12: 0x0000000000100003, R13: 0xffffff803c6f22d0, R14: 0xffffff8222483e70, R15: 0x0000000000000048
RFL: 0x0000000000010286, RIP: 0xffffff7f8ae1c9d1, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x00000000008a0000, Error code: 0x0000000000000002, Fault CPU: 0x4

Backtrace (CPU 4), Frame : Return Address
0xffffff82224834d0 : 0xffffff800a822fa9 mach_kernel : _panic + 0xc9
0xffffff8222483550 : 0xffffff800a8dbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff8222483720 : 0xffffff800a8f3456 mach_kernel : _return_from_trap + 0xe6
0xffffff8222483740 : 0xffffff7f8ae1c9d1 net.lundman.spl : _list_insert_tail + 0x21
0xffffff8222483840 : 0xffffff7f8ae75d8a net.lundman.zfs : _dsl_dir_tempreserve_impl + 0x38a
0xffffff8222483920 : 0xffffff7f8ae759bd net.lundman.zfs : _dsl_dir_tempreserve_space + 0x17d
0xffffff82224839a0 : 0xffffff7f8ae5bea7 net.lundman.zfs : _dmu_tx_try_assign + 0x447
0xffffff8222483a50 : 0xffffff7f8ae5b9eb net.lundman.zfs : _dmu_tx_assign + 0x3b
0xffffff8222483a80 : 0xffffff7f8af04d9b net.lundman.zfs : _zfs_write + 0x9eb
0xffffff8222483d00 : 0xffffff7f8af1400e net.lundman.zfs : _zfs_vnop_write + 0xae
0xffffff8222483d60 : 0xffffff800a9fde51 mach_kernel : _VNOP_WRITE + 0xe1
0xffffff8222483de0 : 0xffffff800a9f3be3 mach_kernel : _utf8_normalizestr + 0x703
0xffffff8222483e50 : 0xffffff800abf22f1 mach_kernel : _write_nocancel + 0x1b1
0xffffff8222483ef0 : 0xffffff800abf21dd mach_kernel : _write_nocancel + 0x9d
0xffffff8222483f50 : 0xffffff800ac40653 mach_kernel : _unix_syscall64 + 0x1f3
0xffffff8222483fb0 : 0xffffff800a8f3c56 mach_kernel : _hndl_unix_scall64 + 0x16
      Kernel Extensions in backtrace:
         net.lundman.spl(1.0)[205406D0-4396-3572-B257-19B5A81B1084]@0xffffff7f8ae1a000->0xffffff7f8ae2afff
         net.lundman.zfs(1.0)[798890EC-39BE-3E0B-8C50-F76283916246]@0xffffff7f8ae2b000->0xffffff7f8b038fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7f8adec000
            dependency: net.lundman.spl(1.0.0)[205406D0-4396-3572-B257-19B5A81B1084]@0xffffff7f8ae1a000

BSD process name corresponding to current thread: squid
Boot args: -v keepsyms=y darkwake=0
rottegift commented 10 years ago

This is an odd message:

$ sudo zpool set cachefile=none Donkey
Password:
pool_set_props
$ zpool get cachefile Donkey 
NAME    PROPERTY   VALUE      SOURCE
Donkey  cachefile  none       local

But when zfs.util gets its mitts on things at boot up, it imports Donkey and:

$ zpool get cachefile Donkey
NAME    PROPERTY   VALUE      SOURCE
Donkey  cachefile  -          default

edit: illumos also loses the cachefile property on reimport with -N. ( https://github.com/openzfsonosx/zfs/issues/203 )

rottegift commented 10 years ago
Anonymous UUID:       EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914

Mon Jun 30 13:16:27 2014
panic(cpu 0 caller 0xffffff800e6dbf5e): Kernel trap at 0xffffff7f8ec1c9f1, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000000, CR3: 0x0000000010bd8000, CR4: 0x00000000001606e0
RAX: 0x0000000000000000, RBX: 0x000000000000009e, RCX: 0x0000000000000000, RDX: 0x0000000000000000
RSP: 0xffffff822555bb90, RBP: 0xffffff822555bb90, RSI: 0xffffff8257877ff8, RDI: 0xffffff8257881118
R8:  0x0000000000000000, R9:  0xffffff800ec01910, R10: 0x00000000000003ff, R11: 0xffffffffffffffff
R12: 0xffffff80417f49e8, R13: 0xffffff8034ff7000, R14: 0x0000000000000100, R15: 0xffffff8035ff4500
RFL: 0x0000000000010246, RIP: 0xffffff7f8ec1c9f1, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000000000000, Error code: 0x0000000000000002, Fault CPU: 0x0

Backtrace (CPU 0), Frame : Return Address
0xffffff822555b820 : 0xffffff800e622fa9 mach_kernel : _panic + 0xc9
0xffffff822555b8a0 : 0xffffff800e6dbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff822555ba70 : 0xffffff800e6f3456 mach_kernel : _return_from_trap + 0xe6
0xffffff822555ba90 : 0xffffff7f8ec1c9f1 net.lundman.spl : _list_remove + 0x11
0xffffff822555bb90 : 0xffffff7f8ec75f14 net.lundman.zfs : _dsl_dir_tempreserve_clear + 0xe4
0xffffff822555bbd0 : 0xffffff7f8ec5c62f net.lundman.zfs : _dmu_tx_commit + 0x10f
0xffffff822555bc00 : 0xffffff7f8ec47231 net.lundman.zfs : _dmu_free_long_range_impl + 0x151
0xffffff822555bc60 : 0xffffff7f8ec47096 net.lundman.zfs : _dmu_free_long_range + 0x66
0xffffff822555bcb0 : 0xffffff7f8ed36746 net.lundman.zfs : _zvol_unmap + 0x126
0xffffff822555bd00 : 0xffffff7f8ed37fc8 net.lundman.zfs : __ZN27net_lundman_zfs_zvol_device9doDiscardEyy + 0x18
0xffffff822555bd10 : 0xffffff7f8ed38025 net.lundman.zfs : __ZN27net_lundman_zfs_zvol_device7doUnmapEP26IOBlockStorageDeviceExtentjj + 0x45
0xffffff822555bd40 : 0xffffff7f8ebf8d66 com.apple.iokit.IOStorageFamily : _dkioctl + 0x8a3
0xffffff822555bdc0 : 0xffffff800e80d66c mach_kernel : _spec_ioctl + 0x14c
0xffffff822555be10 : 0xffffff800e7fe000 mach_kernel : _VNOP_IOCTL + 0x150
0xffffff822555be90 : 0xffffff800e99a3dd mach_kernel : _NodesAreContiguous + 0x98d
0xffffff822555bf90 : 0xffffff800e976ee8 mach_kernel : _hfs_scan_blocks + 0x68
0xffffff822555bfb0 : 0xffffff800e6d7127 mach_kernel : _call_continuation + 0x17
      Kernel Extensions in backtrace:
         com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7f8ebec000->0xffffff7f8ec10fff
         net.lundman.spl(1.0)[205406D0-4396-3572-B257-19B5A81B1084]@0xffffff7f8ec1a000->0xffffff7f8ec2afff
         net.lundman.zfs(1.0)[798890EC-39BE-3E0B-8C50-F76283916246]@0xffffff7f8ec2b000->0xffffff7f8ee38fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7f8ebec000
            dependency: net.lundman.spl(1.0.0)[205406D0-4396-3572-B257-19B5A81B1084]@0xffffff7f8ec1a000

BSD process name corresponding to current thread: kernel_task
Boot args: -v keepsyms=y darkwake=0
rottegift commented 10 years ago

Weird stuff. disk32 is partitioned up for various cache vdevs. disk34 is a whole cache vdev, for Donkey. as below:

$ diskutil list disk32
/dev/disk32
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *128.0 GB   disk32
   1:                        EFI EFI                     209.7 MB   disk32s1
   2:                        ZFS                         18.3 GB    disk32s2
   3:                        ZFS                         18.3 GB    disk32s3
   4:                        ZFS                         18.3 GB    disk32s4
   5:                        ZFS                         17.4 GB    disk32s5
$ diskutil list disk34
/dev/disk34
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *126.2 GB   disk34
   1:                        ZFS                         126.2 GB   disk34s1
   2: 6A945A3B-1DD2-11B2-99A6-080020736631               8.4 MB     disk34s9

zpool status -vx Donkey says pool 'Donkey' is healthy but zpool status -v shows zfs thinks its vdev is disk32.

$  zpool status -v Donkey
  pool: Donkey
 state: ONLINE
  scan: scrub in progress since Mon Jun 30 03:42:42 2014
    9.98G scanned out of 1.63T at 298K/s, (scan is slow, no estimated time)
    0 repaired, 0.60% done
config:

    NAME          STATE     READ WRITE CKSUM
    Donkey        ONLINE       0     0     0
      mirror-0    ONLINE       0     0     0
        disk19s2  ONLINE       0     0     0
        disk20s2  ONLINE       0     0     0
        disk27s2  ONLINE       0     0     0
    logs
      mirror-1    ONLINE       0     0     0
        disk15s1  ONLINE       0     0     0
        disk7s1   ONLINE       0     0     0
    cache
      disk32      ONLINE       0     0     0

errors: No known data errors

and meanwhile other pools say disk34sX UNAVAIL, e.g.

$ zpool status -v CLATM
  pool: CLATM
 state: ONLINE
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: resilvered 171G in 2h46m with 0 errors on Fri Jun 27 02:31:04 2014
config:

    NAME          STATE     READ WRITE CKSUM
    CLATM         ONLINE       0     0     0
      mirror-0    ONLINE       0     0     0
        disk25s1  ONLINE       0     0     0
        disk21    ONLINE       0     0     0
    logs
      mirror-1    ONLINE       0     0     0
        disk14s1  ONLINE       0     0     0
        disk10s1  ONLINE       0     0     0
    cache
      disk33s2    ONLINE       0     0     0
      disk34s2    UNAVAIL      0     0     0  cannot open

errors: No known data errors

So all pools say they have the wrong name, swapping disk32 for disk34 or vice-versa.

Donkey's traffic is going to disk34 per iostat.

$ iostat disk34 10
         disk34       cpu     load average
    KB/t tps  MB/s  us sy id   1m   5m   15m
    4.10   2  0.01   6  5 89  1.16 1.22 0.93
    0.00   0  0.00   2  3 95  1.20 1.23 0.94
    0.00   0  0.00   1  3 96  1.33 1.25 0.95
$ zpool iostat -v Donkey 10
                 capacity     operations    bandwidth
pool          alloc   free   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
Donkey        1.63T   193G    100    209   268K   525K
  mirror      1.63T   193G    100    187   268K   436K
    disk19s2      -      -     52     42   175K   436K
    disk20s2      -      -     47     42   169K   436K
    disk27s2      -      -     41     43   151K   436K
logs              -      -      -      -      -      -
  mirror      1.34M   943M      0     22      0  89.2K
    disk15s1      -      -      0     22      0  89.2K
    disk7s1       -      -      0     22      0  89.2K
cache             -      -      -      -      -      -
  disk32      87.7M   108M      0      2      0   213K
------------  -----  -----  -----  -----  -----  -----
...
cache             -      -      -      -      -      -
  disk32      89.0M   106M      0      1      0   138K
cache             -      -      -      -      -      -
  disk32      90.8M   105M      0      1      0   184K
rottegift commented 10 years ago

But a subsequent manual import sees the right names:

$ zpool import -N Trinity
$ zpool status -v Trinity
  pool: Trinity
 state: ONLINE
  scan: resilvered 58.1M in 0h23m with 0 errors on Mon Jun 30 03:59:09 2014
config:

    NAME          STATE     READ WRITE CKSUM
    Trinity       ONLINE       0     0     0
      mirror-0    ONLINE       0     0     0
        disk23    ONLINE       0     0     0
        disk26s2  ONLINE       0     0     0
      mirror-1    ONLINE       0     0     0
        disk28s2  ONLINE       0     0     0
        disk24s2  ONLINE       0     0     0
      mirror-2    ONLINE       0     0     0
        disk29s2  ONLINE       0     0     0
        disk30s2  ONLINE       0     0     0
      mirror-3    ONLINE       0     0     0
        disk31s2  ONLINE       0     0     0
        disk22s2  ONLINE       0     0     0
    logs
      mirror-4    ONLINE       0     0     0
        disk11s1  ONLINE       0     0     0
        disk13s1  ONLINE       0     0     0
    cache
      disk32s4    ONLINE       0     0     0
      disk33s4    ONLINE       0     0     0

errors: No known data errors

(Sadly exporting seems to trigger a panic as above much of the time).

rottegift commented 10 years ago

Oh, and moving aside the symlink /usr/sbin/zpool -> /usr/local/sbin/zpool causes zfs.util to fail, which I exploited to see how master would do with manual imports. Manually importing pools a minute or two after rebooting (i.e., enough time to be logged in and at a Terminal prompt) finds all the vdevs some of the time, fails to find the cache vdevs frequently, but has so far not failed to find devices in the storage vdevs or in the log vdevs. Also, zpool status/iostat/list -v will fairly frequently report the wrong device numbers. (It's easiest to notice this when the device is also UNAVAIL, for example.)

lundman commented 10 years ago

To rule out the simple case of kmem_alloc "could" return NULL, I made a new SPL branch "Issue201" for you to load. https://github.com/openzfsonosx/spl/commit/5a0c0f4165fd147da76c25f2c682d07f93d62688

It will probably panic like before, but we can at least rule it out.

rottegift commented 10 years ago

I just had double-digit hours of stability under reasonable use (not very high load) abruptly ended by a double+ panic with no reasonable stack trace, so I'm now building and installing this spl branch.

I'll try it against zfs master to start with.

rottegift commented 10 years ago

A couple things wrt master (no panics yet, I'm about to put it through its paces), where there are differences from yesterday's master:

Jul  1 16:50:04 cla.use.net sudo[24033]: aguestpunk : TTY=ttys010 ; PWD=/Users/aguestpunk/Developer/zfs/cmd/zfs_util ; USER=root ; COMMAND=/usr/sbin/chown -R root:wheel /System/Library/Filesystems/zfs.fs
Jul  1 17:11:41 cla.use.net sudo[24074]: aguestpunk : TTY=ttys010 ; PWD=/Users/aguestpunk/Developer/zfs/cmd/zfs_util ; USER=root ; COMMAND=/usr/sbin/chown -R root:wheel /System/Library/Filesystems/zfs.fs
Jul  1 17:20:59 localhost zfs.util[73]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[73]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[73]: argv[2]: disk15s1
Jul  1 17:20:59 localhost zfs.util[73]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[71]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[73]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[71]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[71]: argv[2]: disk14s1
Jul  1 17:20:59 localhost zfs.util[71]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[71]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[71]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[73]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[71]: blockdevice is /dev/disk14s1
Jul  1 17:20:59 localhost zfs.util[71]: +zfs_probe : devpath /dev/rdisk14s1
Jul  1 17:20:59 localhost zfs.util[63]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[63]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[73]: blockdevice is /dev/disk15s1
Jul  1 17:20:59 localhost zfs.util[63]: argv[2]: disk5s1
Jul  1 17:20:59 localhost zfs.util[73]: +zfs_probe : devpath /dev/rdisk15s1
Jul  1 17:20:59 localhost zfs.util[63]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[63]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[63]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[63]: blockdevice is /dev/disk5s1
Jul  1 17:20:59 localhost zfs.util[63]: +zfs_probe : devpath /dev/rdisk5s1
Jul  1 17:20:59 localhost zfs.util[63]: guid 426320761368465395
Jul  1 17:20:59 localhost zfs.util[63]: -zfs_probe : ret -1
Jul  1 17:20:59 localhost zfs.util[63]: FSUC_PROBE /dev/disk5s1 : FSUR_RECOGNIZED : poolguid 426320761368465395
Jul  1 17:20:59 localhost zfs.util[75]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[75]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[75]: argv[2]: disk18s1
Jul  1 17:20:59 localhost zfs.util[75]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[74]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[75]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[74]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[75]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[74]: argv[2]: disk17s1
Jul  1 17:20:59 localhost zfs.util[75]: blockdevice is /dev/disk18s1
Jul  1 17:20:59 localhost zfs.util[74]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[74]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[75]: +zfs_probe : devpath /dev/rdisk18s1
Jul  1 17:20:59 localhost zfs.util[74]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[74]: blockdevice is /dev/disk17s1
Jul  1 17:20:59 localhost zfs.util[74]: +zfs_probe : devpath /dev/rdisk17s1
Jul  1 17:20:59 localhost zfs.util[75]: guid 15380268902317402435
Jul  1 17:20:59 localhost zfs.util[75]: -zfs_probe : ret -1
Jul  1 17:20:59 localhost zfs.util[75]: FSUC_PROBE /dev/disk18s1 : FSUR_RECOGNIZED : poolguid 15380268902317402435
Jul  1 17:20:59 localhost zfs.util[74]: guid 426320761368465395
Jul  1 17:20:59 localhost zfs.util[74]: -zfs_probe : ret -1
Jul  1 17:20:59 localhost zfs.util[74]: FSUC_PROBE /dev/disk17s1 : FSUR_RECOGNIZED : poolguid 426320761368465395
Jul  1 17:20:59 localhost zfs.util[81]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[81]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[81]: argv[2]: disk22s1
Jul  1 17:20:59 localhost zfs.util[81]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[81]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[81]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[81]: blockdevice is /dev/disk22s1
Jul  1 17:20:59 localhost zfs.util[81]: +zfs_probe : devpath /dev/rdisk22s1
Jul  1 17:20:59 localhost zfs.util[86]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[86]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[86]: argv[2]: disk26s1
Jul  1 17:20:59 localhost zfs.util[86]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[86]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[86]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[86]: blockdevice is /dev/disk26s1
Jul  1 17:20:59 localhost zfs.util[86]: +zfs_probe : devpath /dev/rdisk26s1
Jul  1 17:20:59 localhost zfs.util[86]: -zfs_probe : ret -2
Jul  1 17:20:59 localhost zfs.util[86]: FSUC_PROBE /dev/disk26s1 : FSUR_UNRECOGNIZED
Jul  1 17:20:59 localhost zfs.util[108]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[108]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[108]: argv[2]: disk19s2
Jul  1 17:20:59 localhost zfs.util[108]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[108]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[108]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[108]: blockdevice is /dev/disk19s2
Jul  1 17:20:59 localhost zfs.util[108]: +zfs_probe : devpath /dev/rdisk19s2
Jul  1 17:20:59 localhost zfs.util[109]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:20:59 localhost zfs.util[109]: argv[1]: -p
Jul  1 17:20:59 localhost zfs.util[109]: argv[2]: disk20s2
Jul  1 17:20:59 localhost zfs.util[109]: argv[3]: removable
Jul  1 17:20:59 localhost zfs.util[109]: argv[4]: readonly
Jul  1 17:20:59 localhost zfs.util[109]: zfs.util called with option p
Jul  1 17:20:59 localhost zfs.util[109]: blockdevice is /dev/disk20s2
Jul  1 17:20:59 localhost zfs.util[109]: +zfs_probe : devpath /dev/rdisk20s2
Jul  1 17:20:59 localhost zfs.util[108]: guid 15701963728217618441
Jul  1 17:20:59 localhost zfs.util[108]: -zfs_probe : ret -1
Jul  1 17:20:59 localhost zfs.util[108]: FSUC_PROBE /dev/disk19s2 : FSUR_RECOGNIZED : poolguid 15701963728217618441
Jul  1 17:20:59 localhost zfs.util[109]: guid 15701963728217618441
Jul  1 17:20:59 localhost zfs.util[109]: -zfs_probe : ret -1
Jul  1 17:20:59 localhost zfs.util[109]: FSUC_PROBE /dev/disk20s2 : FSUR_RECOGNIZED : poolguid 15701963728217618441
Jul  1 17:21:00 localhost zfs.util[64]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:00 localhost zfs.util[64]: argv[1]: -p
Jul  1 17:21:00 localhost zfs.util[64]: argv[2]: disk6s1
Jul  1 17:21:00 localhost zfs.util[64]: argv[3]: removable
Jul  1 17:21:00 localhost zfs.util[64]: argv[4]: readonly
Jul  1 17:21:00 localhost zfs.util[64]: zfs.util called with option p
Jul  1 17:21:00 localhost zfs.util[64]: blockdevice is /dev/disk6s1
Jul  1 17:21:00 localhost zfs.util[64]: +zfs_probe : devpath /dev/rdisk6s1
Jul  1 17:21:00 localhost zfs.util[64]: guid 426320761368465395
Jul  1 17:21:00 localhost zfs.util[64]: -zfs_probe : ret -1
Jul  1 17:21:00 localhost zfs.util[64]: FSUC_PROBE /dev/disk6s1 : FSUR_RECOGNIZED : poolguid 426320761368465395
Jul  1 17:21:00 localhost zfs.util[66]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:00 localhost zfs.util[66]: argv[1]: -p
Jul  1 17:21:00 localhost zfs.util[66]: argv[2]: disk9s1
Jul  1 17:21:00 localhost zfs.util[66]: argv[3]: removable
Jul  1 17:21:00 localhost zfs.util[66]: argv[4]: readonly
Jul  1 17:21:00 localhost zfs.util[66]: zfs.util called with option p
Jul  1 17:21:00 localhost zfs.util[66]: blockdevice is /dev/disk9s1
Jul  1 17:21:00 localhost zfs.util[66]: +zfs_probe : devpath /dev/rdisk9s1
Jul  1 17:21:00 localhost zfs.util[66]: guid 15701963728217618441
Jul  1 17:21:00 localhost zfs.util[66]: -zfs_probe : ret -1
Jul  1 17:21:00 localhost zfs.util[66]: FSUC_PROBE /dev/disk9s1 : FSUR_RECOGNIZED : poolguid 15701963728217618441
Jul  1 17:21:00 localhost zfs.util[65]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:00 localhost zfs.util[65]: argv[1]: -p
Jul  1 17:21:00 localhost zfs.util[65]: argv[2]: disk7s1
Jul  1 17:21:00 localhost zfs.util[65]: argv[3]: removable
Jul  1 17:21:00 localhost zfs.util[65]: argv[4]: readonly
Jul  1 17:21:00 localhost zfs.util[65]: zfs.util called with option p
Jul  1 17:21:00 localhost zfs.util[65]: blockdevice is /dev/disk7s1
Jul  1 17:21:00 localhost zfs.util[65]: +zfs_probe : devpath /dev/rdisk7s1
Jul  1 17:21:00 localhost zfs.util[65]: guid 15380268902317402435
Jul  1 17:21:00 localhost zfs.util[65]: -zfs_probe : ret -1
Jul  1 17:21:00 localhost zfs.util[65]: FSUC_PROBE /dev/disk7s1 : FSUR_RECOGNIZED : poolguid 15380268902317402435
Jul  1 17:21:02 cla.use.net zfs.util[71]: guid 5933279091430968458
Jul  1 17:21:02 cla.use.net zfs.util[71]: -zfs_probe : ret -1
Jul  1 17:21:02 cla.use.net zfs.util[71]: FSUC_PROBE /dev/disk14s1 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:02 cla.use.net zfs.util[73]: guid 426320761368465395
Jul  1 17:21:02 cla.use.net zfs.util[73]: -zfs_probe : ret -1
Jul  1 17:21:02 cla.use.net zfs.util[73]: FSUC_PROBE /dev/disk15s1 : FSUR_RECOGNIZED : poolguid 426320761368465395
Jul  1 17:21:02 cla.use.net zfs.util[70]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:02 cla.use.net zfs.util[70]: argv[1]: -p
Jul  1 17:21:02 cla.use.net zfs.util[70]: argv[2]: disk13s1
Jul  1 17:21:02 cla.use.net zfs.util[70]: argv[3]: removable
Jul  1 17:21:02 cla.use.net zfs.util[70]: argv[4]: readonly
Jul  1 17:21:02 cla.use.net zfs.util[70]: zfs.util called with option p
Jul  1 17:21:02 cla.use.net zfs.util[70]: blockdevice is /dev/disk13s1
Jul  1 17:21:02 cla.use.net zfs.util[70]: +zfs_probe : devpath /dev/rdisk13s1
Jul  1 17:21:02 cla.use.net zfs.util[70]: guid 15701963728217618441
Jul  1 17:21:02 cla.use.net zfs.util[70]: -zfs_probe : ret -1
Jul  1 17:21:02 cla.use.net zfs.util[70]: FSUC_PROBE /dev/disk13s1 : FSUR_RECOGNIZED : poolguid 15701963728217618441
Jul  1 17:21:02 cla.use.net zfs.util[67]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:02 cla.use.net zfs.util[67]: argv[1]: -p
Jul  1 17:21:02 cla.use.net zfs.util[67]: argv[2]: disk10s1
Jul  1 17:21:02 cla.use.net zfs.util[67]: argv[3]: removable
Jul  1 17:21:02 cla.use.net zfs.util[67]: argv[4]: readonly
Jul  1 17:21:02 cla.use.net zfs.util[67]: zfs.util called with option p
Jul  1 17:21:02 cla.use.net zfs.util[67]: blockdevice is /dev/disk10s1
Jul  1 17:21:02 cla.use.net zfs.util[67]: +zfs_probe : devpath /dev/rdisk10s1
Jul  1 17:21:02 cla.use.net zfs.util[67]: guid 5933279091430968458
Jul  1 17:21:02 cla.use.net zfs.util[67]: -zfs_probe : ret -1
Jul  1 17:21:02 cla.use.net zfs.util[67]: FSUC_PROBE /dev/disk10s1 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[81]: guid 15380268902317402435
Jul  1 17:21:04 cla.use.net zfs.util[81]: -zfs_probe : ret -1
Jul  1 17:21:04 cla.use.net zfs.util[81]: FSUC_PROBE /dev/disk22s1 : FSUR_RECOGNIZED : poolguid 15380268902317402435
Jul  1 17:21:04 cla.use.net zfs.util[367]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:04 cla.use.net zfs.util[367]: argv[1]: -p
Jul  1 17:21:04 cla.use.net zfs.util[367]: argv[2]: disk25s2
Jul  1 17:21:04 cla.use.net zfs.util[367]: argv[3]: removable
Jul  1 17:21:04 cla.use.net zfs.util[367]: argv[4]: readonly
Jul  1 17:21:04 cla.use.net zfs.util[367]: zfs.util called with option p
Jul  1 17:21:04 cla.use.net zfs.util[367]: blockdevice is /dev/disk25s2
Jul  1 17:21:04 cla.use.net zfs.util[367]: +zfs_probe : devpath /dev/rdisk25s2
Jul  1 17:21:04 cla.use.net zfs.util[369]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:04 cla.use.net zfs.util[369]: argv[1]: -p
Jul  1 17:21:04 cla.use.net zfs.util[369]: argv[2]: disk27s2
Jul  1 17:21:04 cla.use.net zfs.util[369]: argv[3]: removable
Jul  1 17:21:04 cla.use.net zfs.util[369]: argv[4]: readonly
Jul  1 17:21:04 cla.use.net zfs.util[369]: zfs.util called with option p
Jul  1 17:21:04 cla.use.net zfs.util[369]: blockdevice is /dev/disk27s2
Jul  1 17:21:04 cla.use.net zfs.util[369]: +zfs_probe : devpath /dev/rdisk27s2
Jul  1 17:21:04 cla.use.net zfs.util[375]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:04 cla.use.net zfs.util[375]: argv[1]: -p
Jul  1 17:21:04 cla.use.net zfs.util[375]: argv[2]: disk24s2
Jul  1 17:21:04 cla.use.net zfs.util[375]: argv[3]: removable
Jul  1 17:21:04 cla.use.net zfs.util[375]: argv[4]: readonly
Jul  1 17:21:04 cla.use.net zfs.util[375]: zfs.util called with option p
Jul  1 17:21:04 cla.use.net zfs.util[375]: blockdevice is /dev/disk24s2
Jul  1 17:21:04 cla.use.net zfs.util[375]: +zfs_probe : devpath /dev/rdisk24s2
Jul  1 17:21:04 cla.use.net zfs.util[375]: guid 15701963728217618441
Jul  1 17:21:04 cla.use.net zfs.util[375]: -zfs_probe : ret -1
Jul  1 17:21:04 cla.use.net zfs.util[375]: FSUC_PROBE /dev/disk24s2 : FSUR_RECOGNIZED : poolguid 15701963728217618441
Jul  1 17:21:04 cla.use.net zfs.util[398]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:04 cla.use.net zfs.util[398]: argv[1]: -p
Jul  1 17:21:04 cla.use.net zfs.util[398]: argv[2]: disk23s2
Jul  1 17:21:04 cla.use.net zfs.util[398]: argv[3]: removable
Jul  1 17:21:04 cla.use.net zfs.util[398]: argv[4]: readonly
Jul  1 17:21:04 cla.use.net zfs.util[398]: zfs.util called with option p
Jul  1 17:21:04 cla.use.net zfs.util[398]: blockdevice is /dev/disk23s2
Jul  1 17:21:04 cla.use.net zfs.util[398]: +zfs_probe : devpath /dev/rdisk23s2
Jul  1 17:21:04 cla.use.net zfs.util[397]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:04 cla.use.net zfs.util[397]: argv[1]: -p
Jul  1 17:21:04 cla.use.net zfs.util[397]: argv[2]: disk21s2
Jul  1 17:21:04 cla.use.net zfs.util[397]: argv[3]: removable
Jul  1 17:21:04 cla.use.net zfs.util[397]: argv[4]: readonly
Jul  1 17:21:04 cla.use.net zfs.util[397]: zfs.util called with option p
Jul  1 17:21:04 cla.use.net zfs.util[397]: blockdevice is /dev/disk21s2
Jul  1 17:21:04 cla.use.net zfs.util[397]: +zfs_probe : devpath /dev/rdisk21s2
Jul  1 17:21:04 cla.use.net zfs.util[398]: guid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[398]: -zfs_probe : ret -1
Jul  1 17:21:04 cla.use.net zfs.util[398]: FSUC_PROBE /dev/disk23s2 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[397]: guid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[397]: -zfs_probe : ret -1
Jul  1 17:21:04 cla.use.net zfs.util[397]: FSUC_PROBE /dev/disk21s2 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[369]: guid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[369]: -zfs_probe : ret -1
Jul  1 17:21:04 cla.use.net zfs.util[369]: FSUC_PROBE /dev/disk27s2 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[367]: guid 5933279091430968458
Jul  1 17:21:04 cla.use.net zfs.util[367]: -zfs_probe : ret -1
Jul  1 17:21:04 cla.use.net zfs.util[367]: FSUC_PROBE /dev/disk25s2 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[424]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:05 cla.use.net zfs.util[424]: argv[1]: -p
Jul  1 17:21:05 cla.use.net zfs.util[424]: argv[2]: disk28s1
Jul  1 17:21:05 cla.use.net zfs.util[424]: argv[3]: removable
Jul  1 17:21:05 cla.use.net zfs.util[424]: argv[4]: readonly
Jul  1 17:21:05 cla.use.net zfs.util[424]: zfs.util called with option p
Jul  1 17:21:05 cla.use.net zfs.util[424]: blockdevice is /dev/disk28s1
Jul  1 17:21:05 cla.use.net zfs.util[424]: +zfs_probe : devpath /dev/rdisk28s1
Jul  1 17:21:05 cla.use.net zfs.util[424]: guid 15380268902317402435
Jul  1 17:21:05 cla.use.net zfs.util[424]: -zfs_probe : ret -1
Jul  1 17:21:05 cla.use.net zfs.util[424]: FSUC_PROBE /dev/disk28s1 : FSUR_RECOGNIZED : poolguid 15380268902317402435
Jul  1 17:21:05 cla.use.net zfs.util[427]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:05 cla.use.net zfs.util[427]: argv[1]: -p
Jul  1 17:21:05 cla.use.net zfs.util[427]: argv[2]: disk29s9
Jul  1 17:21:05 cla.use.net zfs.util[427]: argv[3]: removable
Jul  1 17:21:05 cla.use.net zfs.util[427]: argv[4]: readonly
Jul  1 17:21:05 cla.use.net zfs.util[427]: zfs.util called with option p
Jul  1 17:21:05 cla.use.net zfs.util[427]: blockdevice is /dev/disk29s9
Jul  1 17:21:05 cla.use.net zfs.util[427]: +zfs_probe : devpath /dev/rdisk29s9
Jul  1 17:21:05 cla.use.net zfs.util[427]: -zfs_probe : ret -2
Jul  1 17:21:05 cla.use.net zfs.util[427]: FSUC_PROBE /dev/disk29s9 : FSUR_UNRECOGNIZED
Jul  1 17:21:05 cla.use.net zfs.util[442]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:05 cla.use.net zfs.util[442]: argv[1]: -p
Jul  1 17:21:05 cla.use.net zfs.util[442]: argv[2]: disk30s2
Jul  1 17:21:05 cla.use.net zfs.util[442]: argv[3]: removable
Jul  1 17:21:05 cla.use.net zfs.util[442]: argv[4]: readonly
Jul  1 17:21:05 cla.use.net zfs.util[442]: zfs.util called with option p
Jul  1 17:21:05 cla.use.net zfs.util[442]: blockdevice is /dev/disk30s2
Jul  1 17:21:05 cla.use.net zfs.util[442]: +zfs_probe : devpath /dev/rdisk30s2
Jul  1 17:21:05 cla.use.net zfs.util[444]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:05 cla.use.net zfs.util[444]: argv[1]: -p
Jul  1 17:21:05 cla.use.net zfs.util[444]: argv[2]: disk29s1
Jul  1 17:21:05 cla.use.net zfs.util[444]: argv[3]: removable
Jul  1 17:21:05 cla.use.net zfs.util[444]: argv[4]: readonly
Jul  1 17:21:05 cla.use.net zfs.util[444]: zfs.util called with option p
Jul  1 17:21:05 cla.use.net zfs.util[444]: blockdevice is /dev/disk29s1
Jul  1 17:21:05 cla.use.net zfs.util[444]: +zfs_probe : devpath /dev/rdisk29s1
Jul  1 17:21:05 cla.use.net zfs.util[442]: guid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[442]: -zfs_probe : ret -1
Jul  1 17:21:05 cla.use.net zfs.util[442]: FSUC_PROBE /dev/disk30s2 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[446]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:05 cla.use.net zfs.util[446]: argv[1]: -p
Jul  1 17:21:05 cla.use.net zfs.util[446]: argv[2]: disk31s2
Jul  1 17:21:05 cla.use.net zfs.util[446]: argv[3]: removable
Jul  1 17:21:05 cla.use.net zfs.util[446]: argv[4]: readonly
Jul  1 17:21:05 cla.use.net zfs.util[446]: zfs.util called with option p
Jul  1 17:21:05 cla.use.net zfs.util[446]: blockdevice is /dev/disk31s2
Jul  1 17:21:05 cla.use.net zfs.util[446]: +zfs_probe : devpath /dev/rdisk31s2
Jul  1 17:21:05 cla.use.net zfs.util[448]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:05 cla.use.net zfs.util[448]: argv[1]: -p
Jul  1 17:21:05 cla.use.net zfs.util[448]: argv[2]: disk32s2
Jul  1 17:21:05 cla.use.net zfs.util[448]: argv[3]: removable
Jul  1 17:21:05 cla.use.net zfs.util[448]: argv[4]: readonly
Jul  1 17:21:05 cla.use.net zfs.util[448]: zfs.util called with option p
Jul  1 17:21:05 cla.use.net zfs.util[448]: blockdevice is /dev/disk32s2
Jul  1 17:21:05 cla.use.net zfs.util[448]: +zfs_probe : devpath /dev/rdisk32s2
Jul  1 17:21:05 cla.use.net zfs.util[446]: guid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[446]: -zfs_probe : ret -1
Jul  1 17:21:05 cla.use.net zfs.util[446]: FSUC_PROBE /dev/disk31s2 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[444]: guid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[444]: -zfs_probe : ret -1
Jul  1 17:21:05 cla.use.net zfs.util[444]: FSUC_PROBE /dev/disk29s1 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[448]: guid 5933279091430968458
Jul  1 17:21:05 cla.use.net zfs.util[448]: -zfs_probe : ret -1
Jul  1 17:21:05 cla.use.net zfs.util[448]: FSUC_PROBE /dev/disk32s2 : FSUR_RECOGNIZED : poolguid 5933279091430968458
Jul  1 17:21:09 cla.use.net zfs.util[74]: zpool import error 1
Jul  1 17:21:09 cla.use.net zfs.util[553]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:09 cla.use.net zfs.util[553]: argv[1]: -k
Jul  1 17:21:09 cla.use.net zfs.util[553]: argv[2]: disk17s1
Jul  1 17:21:09 cla.use.net zfs.util[553]: zfs.util called with option k
Jul  1 17:21:09 cla.use.net zfs.util[553]: blockdevice is /dev/disk17s1
Jul  1 17:21:09 cla.use.net zfs.util[553]: FSUC_GETUUID
Jul  1 17:21:10 cla.use.net zfs.util[63]: zpool import error 1
Jul  1 17:21:10 cla.use.net zfs.util[567]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:10 cla.use.net zfs.util[567]: argv[1]: -k
Jul  1 17:21:10 cla.use.net zfs.util[567]: argv[2]: disk5s1
Jul  1 17:21:10 cla.use.net zfs.util[567]: zfs.util called with option k
Jul  1 17:21:10 cla.use.net zfs.util[567]: blockdevice is /dev/disk5s1
Jul  1 17:21:10 cla.use.net zfs.util[567]: FSUC_GETUUID
Jul  1 17:21:10 cla.use.net zfs.util[75]: zpool import error 1
Jul  1 17:21:10 cla.use.net zfs.util[581]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:10 cla.use.net zfs.util[581]: argv[1]: -k
Jul  1 17:21:10 cla.use.net zfs.util[581]: argv[2]: disk18s1
Jul  1 17:21:10 cla.use.net zfs.util[581]: zfs.util called with option k
Jul  1 17:21:10 cla.use.net zfs.util[581]: blockdevice is /dev/disk18s1
Jul  1 17:21:10 cla.use.net zfs.util[581]: FSUC_GETUUID
Jul  1 17:21:13 cla.use.net zfs.util[109]: zpool import error 1
Jul  1 17:21:13 cla.use.net zfs.util[607]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:13 cla.use.net zfs.util[607]: argv[1]: -k
Jul  1 17:21:13 cla.use.net zfs.util[607]: argv[2]: disk20s2
Jul  1 17:21:13 cla.use.net zfs.util[607]: zfs.util called with option k
Jul  1 17:21:13 cla.use.net zfs.util[607]: blockdevice is /dev/disk20s2
Jul  1 17:21:13 cla.use.net zfs.util[607]: FSUC_GETUUID
Jul  1 17:21:27 cla.use.net zfs.util[108]: zpool import error 1
Jul  1 17:21:27 cla.use.net zfs.util[745]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:27 cla.use.net zfs.util[745]: argv[1]: -k
Jul  1 17:21:27 cla.use.net zfs.util[745]: argv[2]: disk19s2
Jul  1 17:21:27 cla.use.net zfs.util[745]: zfs.util called with option k
Jul  1 17:21:27 cla.use.net zfs.util[745]: blockdevice is /dev/disk19s2
Jul  1 17:21:27 cla.use.net zfs.util[745]: FSUC_GETUUID
Jul  1 17:21:58 cla.use.net zfs.util[73]: zpool import error 1
Jul  1 17:21:58 cla.use.net zfs.util[1161]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:58 cla.use.net zfs.util[1161]: argv[1]: -k
Jul  1 17:21:58 cla.use.net zfs.util[1161]: argv[2]: disk15s1
Jul  1 17:21:58 cla.use.net zfs.util[1161]: zfs.util called with option k
Jul  1 17:21:58 cla.use.net zfs.util[1161]: blockdevice is /dev/disk15s1
Jul  1 17:21:58 cla.use.net zfs.util[1161]: FSUC_GETUUID
Jul  1 17:21:58 cla.use.net zfs.util[1162]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:58 cla.use.net zfs.util[1162]: argv[1]: -p
Jul  1 17:21:58 cla.use.net zfs.util[1162]: argv[2]: disk15s9
Jul  1 17:21:58 cla.use.net zfs.util[1162]: argv[3]: removable
Jul  1 17:21:58 cla.use.net zfs.util[1162]: argv[4]: readonly
Jul  1 17:21:58 cla.use.net zfs.util[1162]: zfs.util called with option p
Jul  1 17:21:58 cla.use.net zfs.util[1162]: blockdevice is /dev/disk15s9
Jul  1 17:21:58 cla.use.net zfs.util[1162]: +zfs_probe : devpath /dev/rdisk15s9
Jul  1 17:21:58 cla.use.net zfs.util[1162]: -zfs_probe : ret -2
Jul  1 17:21:58 cla.use.net zfs.util[1162]: FSUC_PROBE /dev/disk15s9 : FSUR_UNRECOGNIZED
Jul  1 17:21:59 cla.use.net zfs.util[66]: zpool import error 1
Jul  1 17:21:59 cla.use.net zfs.util[1173]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:21:59 cla.use.net zfs.util[1173]: argv[1]: -k
Jul  1 17:21:59 cla.use.net zfs.util[1173]: argv[2]: disk9s1
Jul  1 17:21:59 cla.use.net zfs.util[1173]: zfs.util called with option k
Jul  1 17:21:59 cla.use.net zfs.util[1173]: blockdevice is /dev/disk9s1
Jul  1 17:21:59 cla.use.net zfs.util[1173]: FSUC_GETUUID
Jul  1 17:22:26 cla.use.net zfs.util[71]: zpool import error 1
Jul  1 17:22:26 cla.use.net zfs.util[1315]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:22:26 cla.use.net zfs.util[1315]: argv[1]: -k
Jul  1 17:22:26 cla.use.net zfs.util[1315]: argv[2]: disk14s1
Jul  1 17:22:26 cla.use.net zfs.util[1315]: zfs.util called with option k
Jul  1 17:22:26 cla.use.net zfs.util[1315]: blockdevice is /dev/disk14s1
Jul  1 17:22:26 cla.use.net zfs.util[1315]: FSUC_GETUUID
Jul  1 17:22:26 cla.use.net zfs.util[424]: zpool import error 1
Jul  1 17:22:26 cla.use.net zfs.util[1319]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:22:26 cla.use.net zfs.util[1319]: argv[1]: -k
Jul  1 17:22:26 cla.use.net zfs.util[1319]: argv[2]: disk28s1
Jul  1 17:22:26 cla.use.net zfs.util[1319]: zfs.util called with option k
Jul  1 17:22:26 cla.use.net zfs.util[1319]: blockdevice is /dev/disk28s1
Jul  1 17:22:26 cla.use.net zfs.util[1319]: FSUC_GETUUID
Jul  1 17:22:26 cla.use.net zfs.util[1320]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:22:26 cla.use.net zfs.util[1320]: argv[1]: -p
Jul  1 17:22:26 cla.use.net zfs.util[1320]: argv[2]: disk28s9
Jul  1 17:22:26 cla.use.net zfs.util[1320]: argv[3]: removable
Jul  1 17:22:26 cla.use.net zfs.util[1320]: argv[4]: readonly
Jul  1 17:22:26 cla.use.net zfs.util[1320]: zfs.util called with option p
Jul  1 17:22:26 cla.use.net zfs.util[1320]: blockdevice is /dev/disk28s9
Jul  1 17:22:26 cla.use.net zfs.util[1320]: +zfs_probe : devpath /dev/rdisk28s9
Jul  1 17:22:26 cla.use.net zfs.util[1320]: -zfs_probe : ret -2
Jul  1 17:22:26 cla.use.net zfs.util[1320]: FSUC_PROBE /dev/disk28s9 : FSUR_UNRECOGNIZED
Jul  1 17:22:35 cla.use.net zfs.util[375]: zpool import error 1
Jul  1 17:22:35 cla.use.net zfs.util[1367]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:22:35 cla.use.net zfs.util[1367]: argv[1]: -k
Jul  1 17:22:35 cla.use.net zfs.util[1367]: argv[2]: disk24s2
Jul  1 17:22:35 cla.use.net zfs.util[1367]: zfs.util called with option k
Jul  1 17:22:35 cla.use.net zfs.util[1367]: blockdevice is /dev/disk24s2
Jul  1 17:22:35 cla.use.net zfs.util[1367]: FSUC_GETUUID
Jul  1 17:22:36 cla.use.net zfs.util[70]: zpool import error 1
Jul  1 17:22:36 cla.use.net zfs.util[1378]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:22:36 cla.use.net zfs.util[1378]: argv[1]: -k
Jul  1 17:22:36 cla.use.net zfs.util[1378]: argv[2]: disk13s1
Jul  1 17:22:36 cla.use.net zfs.util[1378]: zfs.util called with option k
Jul  1 17:22:36 cla.use.net zfs.util[1378]: blockdevice is /dev/disk13s1
Jul  1 17:22:36 cla.use.net zfs.util[1378]: FSUC_GETUUID
Jul  1 17:22:45 cla.use.net zfs.util[367]: zpool import error 1
Jul  1 17:22:45 cla.use.net zfs.util[1440]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:22:45 cla.use.net zfs.util[1440]: argv[1]: -k
Jul  1 17:22:45 cla.use.net zfs.util[1440]: argv[2]: disk25s2
Jul  1 17:22:45 cla.use.net zfs.util[1440]: zfs.util called with option k
Jul  1 17:22:45 cla.use.net zfs.util[1440]: blockdevice is /dev/disk25s2
Jul  1 17:22:45 cla.use.net zfs.util[1440]: FSUC_GETUUID
Jul  1 17:22:54 cla.use.net zfs.util[398]: zpool import error 1
Jul  1 17:22:54 cla.use.net zfs.util[1487]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:22:54 cla.use.net zfs.util[1487]: argv[1]: -k
Jul  1 17:22:54 cla.use.net zfs.util[1487]: argv[2]: disk23s2
Jul  1 17:22:54 cla.use.net zfs.util[1487]: zfs.util called with option k
Jul  1 17:22:54 cla.use.net zfs.util[1487]: blockdevice is /dev/disk23s2
Jul  1 17:22:54 cla.use.net zfs.util[1487]: FSUC_GETUUID
Jul  1 17:23:04 cla.use.net zfs.util[64]: zpool import error 1
Jul  1 17:23:04 cla.use.net zfs.util[1534]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:23:04 cla.use.net zfs.util[1534]: argv[1]: -k
Jul  1 17:23:04 cla.use.net zfs.util[1534]: argv[2]: disk6s1
Jul  1 17:23:04 cla.use.net zfs.util[1534]: zfs.util called with option k
Jul  1 17:23:04 cla.use.net zfs.util[1534]: blockdevice is /dev/disk6s1
Jul  1 17:23:04 cla.use.net zfs.util[1534]: FSUC_GETUUID
Jul  1 17:23:22 cla.use.net zfs.util[442]: zpool import error 1
Jul  1 17:23:22 cla.use.net zfs.util[1626]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:23:22 cla.use.net zfs.util[1626]: argv[1]: -k
Jul  1 17:23:22 cla.use.net zfs.util[1626]: argv[2]: disk30s2
Jul  1 17:23:22 cla.use.net zfs.util[1626]: zfs.util called with option k
Jul  1 17:23:22 cla.use.net zfs.util[1626]: blockdevice is /dev/disk30s2
Jul  1 17:23:22 cla.use.net zfs.util[1626]: FSUC_GETUUID
Jul  1 17:23:31 cla.use.net zfs.util[369]: zpool import error 1
Jul  1 17:23:31 cla.use.net zfs.util[1669]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:23:31 cla.use.net zfs.util[1669]: argv[1]: -k
Jul  1 17:23:31 cla.use.net zfs.util[1669]: argv[2]: disk27s2
Jul  1 17:23:31 cla.use.net zfs.util[1669]: zfs.util called with option k
Jul  1 17:23:31 cla.use.net zfs.util[1669]: blockdevice is /dev/disk27s2
Jul  1 17:23:31 cla.use.net zfs.util[1669]: FSUC_GETUUID
Jul  1 17:23:40 cla.use.net zfs.util[448]: zpool import error 1
Jul  1 17:23:40 cla.use.net zfs.util[1717]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:23:40 cla.use.net zfs.util[1717]: argv[1]: -k
Jul  1 17:23:40 cla.use.net zfs.util[1717]: argv[2]: disk32s2
Jul  1 17:23:40 cla.use.net zfs.util[1717]: zfs.util called with option k
Jul  1 17:23:40 cla.use.net zfs.util[1717]: blockdevice is /dev/disk32s2
Jul  1 17:23:40 cla.use.net zfs.util[1717]: FSUC_GETUUID
Jul  1 17:23:49 cla.use.net zfs.util[397]: zpool import error 1
Jul  1 17:23:49 cla.use.net zfs.util[1765]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:23:49 cla.use.net zfs.util[1765]: argv[1]: -k
Jul  1 17:23:49 cla.use.net zfs.util[1765]: argv[2]: disk21s2
Jul  1 17:23:49 cla.use.net zfs.util[1765]: zfs.util called with option k
Jul  1 17:23:49 cla.use.net zfs.util[1765]: blockdevice is /dev/disk21s2
Jul  1 17:23:49 cla.use.net zfs.util[1765]: FSUC_GETUUID
Jul  1 17:23:59 cla.use.net zfs.util[65]: zpool import error 1
Jul  1 17:23:59 cla.use.net zfs.util[1808]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:23:59 cla.use.net zfs.util[1808]: argv[1]: -k
Jul  1 17:23:59 cla.use.net zfs.util[1808]: argv[2]: disk7s1
Jul  1 17:23:59 cla.use.net zfs.util[1808]: zfs.util called with option k
Jul  1 17:23:59 cla.use.net zfs.util[1808]: blockdevice is /dev/disk7s1
Jul  1 17:23:59 cla.use.net zfs.util[1808]: FSUC_GETUUID
Jul  1 17:24:08 cla.use.net zfs.util[446]: zpool import error 1
Jul  1 17:24:08 cla.use.net zfs.util[1859]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:24:08 cla.use.net zfs.util[1859]: argv[1]: -k
Jul  1 17:24:08 cla.use.net zfs.util[1859]: argv[2]: disk31s2
Jul  1 17:24:08 cla.use.net zfs.util[1859]: zfs.util called with option k
Jul  1 17:24:08 cla.use.net zfs.util[1859]: blockdevice is /dev/disk31s2
Jul  1 17:24:08 cla.use.net zfs.util[1859]: FSUC_GETUUID
Jul  1 17:24:17 cla.use.net zfs.util[444]: zpool import error 1
Jul  1 17:24:17 cla.use.net zfs.util[1903]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:24:17 cla.use.net zfs.util[1903]: argv[1]: -k
Jul  1 17:24:17 cla.use.net zfs.util[1903]: argv[2]: disk29s1
Jul  1 17:24:17 cla.use.net zfs.util[1903]: zfs.util called with option k
Jul  1 17:24:17 cla.use.net zfs.util[1903]: blockdevice is /dev/disk29s1
Jul  1 17:24:17 cla.use.net zfs.util[1903]: FSUC_GETUUID
Jul  1 17:24:18 cla.use.net zfs.util[81]: zpool import error 1
Jul  1 17:24:18 cla.use.net zfs.util[1906]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:24:18 cla.use.net zfs.util[1906]: argv[1]: -k
Jul  1 17:24:18 cla.use.net zfs.util[1906]: argv[2]: disk22s1
Jul  1 17:24:18 cla.use.net zfs.util[1906]: zfs.util called with option k
Jul  1 17:24:18 cla.use.net zfs.util[1906]: blockdevice is /dev/disk22s1
Jul  1 17:24:18 cla.use.net zfs.util[1906]: FSUC_GETUUID
Jul  1 17:24:18 cla.use.net zfs.util[1907]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:24:18 cla.use.net zfs.util[1907]: argv[1]: -p
Jul  1 17:24:18 cla.use.net zfs.util[1907]: argv[2]: disk22s9
Jul  1 17:24:18 cla.use.net zfs.util[1907]: argv[3]: removable
Jul  1 17:24:18 cla.use.net zfs.util[1907]: argv[4]: readonly
Jul  1 17:24:18 cla.use.net zfs.util[1907]: zfs.util called with option p
Jul  1 17:24:18 cla.use.net zfs.util[1907]: blockdevice is /dev/disk22s9
Jul  1 17:24:18 cla.use.net zfs.util[1907]: +zfs_probe : devpath /dev/rdisk22s9
Jul  1 17:24:18 cla.use.net zfs.util[1907]: -zfs_probe : ret -2
Jul  1 17:24:18 cla.use.net zfs.util[1907]: FSUC_PROBE /dev/disk22s9 : FSUR_UNRECOGNIZED
Jul  1 17:24:27 cla.use.net zfs.util[67]: zpool import error 1
Jul  1 17:24:27 cla.use.net zfs.util[1952]: argv[0]: /System/Library/Filesystems/zfs.fs/Contents/Resources/./zfs.util
Jul  1 17:24:27 cla.use.net zfs.util[1952]: argv[1]: -k
Jul  1 17:24:27 cla.use.net zfs.util[1952]: argv[2]: disk10s1
Jul  1 17:24:27 cla.use.net zfs.util[1952]: zfs.util called with option k
Jul  1 17:24:27 cla.use.net zfs.util[1952]: blockdevice is /dev/disk10s1
Jul  1 17:24:27 cla.use.net zfs.util[1952]: FSUC_GETUUID
ilovezfs commented 10 years ago

@rottegift It is now set to only import if the pool status is ZPOOL_STATUS_OK, which is the equivalent of a 100% clean sudo zpool import pool discovery. That it is not importing automatically implies that your sudo zpool import output will show DEGRADED, or some other unhealthy state.

rottegift commented 10 years ago

@ilovezfs ok that's good, although it does take a long time to make the zfs subsystem available (e.g. to "zpool status") when there are lots of devices to check.

ilovezfs commented 10 years ago

@rottegift Are you suggesting it should be doing something differently?

rottegift commented 10 years ago
Anonymous UUID:       EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914

Tue Jul  1 17:50:43 2014
panic(cpu 4 caller 0xffffff802cedbf5e): Kernel trap at 0xffffff7fad53fe5b, type 14=page fault, registers:
CR0: 0x0000000080010033, CR2: 0x0000000000000000, CR3: 0x000000001f4fe0f7, CR4: 0x00000000001606e0
RAX: 0x0000000000000000, RBX: 0xffffff821e70bd58, RCX: 0x0000000000000000, RDX: 0x0000000000000008
RSP: 0xffffff821e70b9c0, RBP: 0xffffff821e70b9e0, RSI: 0xffffff826052ac08, RDI: 0xffffff8359635338
R8:  0x0000000000000008, R9:  0x00000000fffffff8, R10: 0xffffff821e70b8a0, R11: 0xffffff821e70b900
R12: 0x0000000012000321, R13: 0xffffff8064230780, R14: 0x0000000012000321, R15: 0xffffff805a871df8
RFL: 0x0000000000010282, RIP: 0xffffff7fad53fe5b, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000000000000, Error code: 0x0000000000000000, Fault CPU: 0x4

Backtrace (CPU 4), Frame : Return Address
0xffffff821e70b650 : 0xffffff802ce22fa9 mach_kernel : _panic + 0xc9
0xffffff821e70b6d0 : 0xffffff802cedbf5e mach_kernel : _kernel_trap + 0x7fe
0xffffff821e70b8a0 : 0xffffff802cef3456 mach_kernel : _return_from_trap + 0xe6
0xffffff821e70b8c0 : 0xffffff7fad53fe5b net.lundman.zfs : _nvp_buf_link + 0x3b
0xffffff821e70b9e0 : 0xffffff7fad53c63c net.lundman.zfs : _nvlist_add_common + 0x5bc
0xffffff821e70bac0 : 0xffffff7fad53c8c1 net.lundman.zfs : _nvlist_add_uint64 + 0x31
0xffffff821e70baf0 : 0xffffff7fad47cbb6 net.lundman.zfs : _dsl_prop_nvlist_add_uint64 + 0xc6
0xffffff821e70bb30 : 0xffffff7fad44f24e net.lundman.zfs : _dmu_objset_stats + 0x4e
0xffffff821e70bb60 : 0xffffff7fad4f3e68 net.lundman.zfs : _zfs_ioc_objset_stats_impl + 0x68
0xffffff821e70bba0 : 0xffffff7fad4f0c7b net.lundman.zfs : _zfs_ioc_snapshot_list_next + 0x1ab
0xffffff821e70bc20 : 0xffffff7fad4ec7a4 net.lundman.zfs : _zfsdev_ioctl + 0x664
0xffffff821e70bcf0 : 0xffffff802d00d63f mach_kernel : _spec_ioctl + 0x11f
0xffffff821e70bd40 : 0xffffff802cffe000 mach_kernel : _VNOP_IOCTL + 0x150
0xffffff821e70bdc0 : 0xffffff802cff3e51 mach_kernel : _utf8_normalizestr + 0x971
0xffffff821e70be10 : 0xffffff802d1c1303 mach_kernel : _fo_ioctl + 0x43
0xffffff821e70be40 : 0xffffff802d1f2c66 mach_kernel : _ioctl + 0x466
0xffffff821e70bf50 : 0xffffff802d240653 mach_kernel : _unix_syscall64 + 0x1f3
0xffffff821e70bfb0 : 0xffffff802cef3c56 mach_kernel : _hndl_unix_scall64 + 0x16
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.0)[798890EC-39BE-3E0B-8C50-F76283916246]@0xffffff7fad42b000->0xffffff7fad638fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7fad3ec000
            dependency: net.lundman.spl(1.0.0)[E02134D8-4745-316F-B45F-ECA775392F75]@0xffffff7fad41a000

BSD process name corresponding to current thread: zfs
Boot args: -v keepsyms=y darkwake=0

and

Jul  1 17:32:14 cla kernel[0]: [SPL] bmalloc slice allocator initialised
Jul  1 17:32:14 cla kernel[0]: SPL: Total memory 17179869184
Jul  1 17:32:14 cla kernel[0]: SPL: Loaded module v0.01 rottegift (ncpu 8, memsize 17179869184, pages 4194304)
rottegift commented 10 years ago

@ilovezfs - no, but it is slower than zpool import. Also a subsequent import seems to find more disks. Maybe spawning all the zfs.util processes in parallel is a false parallelism-gain approach? Or maybe some sort of retry mechanism should go into zfs.util?