openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
823 stars 72 forks source link

Panic below zfs_log.c:487 (master/head) #186

Closed rottegift closed 10 years ago

rottegift commented 10 years ago

Sigh, something unknown (I don't even know if it's in user land) keeps changing nvram back to "-v darkwake=0" destroying the keepsyms keyword. :-(

I have mirrored log vdevs on three pools, there was probably substantial sync writes and memory pressure at the time of the panic.

However:

xcrun atos -arch x86_64 -l 0xffffff7f83227000 -o ~/Developer/zfs/module/zfs/zfs.kext/Contents/MacOS/zfs 0xffffff7f832189c1 0xffffff7f8321a98f 0xffffff7f832f207f 0xffffff7f83300c67 0xffffff7f8330f7ee  0xffffff8002dfdb31
got symbolicator for /Users/aguestpunk/Developer/zfs/module/zfs/zfs.kext/Contents/MacOS/zfs, base address 0
0xffffff7f832189c1 [out of range of zfs anyway]
0xffffff7f8321a98f [ditto]
zfs_log_write (in zfs) (zfs_log.c:487)
zfs_write (in zfs) (zfs_vnops.c:1162)
zfs_vnop_write (in zfs) (zfs_vnops_osx.c:282)
0xffffff8002dfdb31 [out of range of zfs anyway]
Anonymous UUID:       EA3E4DC2-8F4D-9BF6-7D16-4BB6CA19A914

Fri May 23 06:18:17 2014
panic(cpu 3 caller 0xffffff8002cdbe7e): Kernel trap at 0xffffff7f832189c1, type 14=page fault, registers:
CR0: 0x0000000080010033, CR2: 0x0000000000000000, CR3: 0x000000006f433085, CR4: 0x00000000001606e0
RAX: 0x0000000000000018, RBX: 0xffffff824558b460, RCX: 0x0000000000000000, RDX: 0x0000000000000000
RSP: 0xffffff81f040b9a0, RBP: 0xffffff81f040b9a0, RSI: 0xffffff824558b460, RDI: 0xffffff7f8321d360
R8:  0x00000000000009bf, R9:  0x0000000000000d64, R10: 0x0000000000000000, R11: 0x0000000000000000
R12: 0x0000000000004afd, R13: 0xffffff7f8321d360, R14: 0xffffff7f8321d340, R15: 0x0000000000000001
RFL: 0x0000000000010246, RIP: 0xffffff7f832189c1, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000000000000, Error code: 0x0000000000000002, Fault CPU: 0x3

Backtrace (CPU 3), Frame : Return Address
0xffffff81f040b630 : 0xffffff8002c22fa9 
0xffffff81f040b6b0 : 0xffffff8002cdbe7e 
0xffffff81f040b880 : 0xffffff8002cf3376 
0xffffff81f040b8a0 : 0xffffff7f832189c1 
0xffffff81f040b9a0 : 0xffffff7f8321a98f 
0xffffff81f040b9e0 : 0xffffff7f832f207f 
0xffffff81f040ba90 : 0xffffff7f83300c67 
0xffffff81f040bd10 : 0xffffff7f8330f7ee 
0xffffff81f040bd70 : 0xffffff8002dfdb31 
0xffffff81f040bdf0 : 0xffffff8002df38c3 
0xffffff81f040be60 : 0xffffff8002ff00c1 
0xffffff81f040bf00 : 0xffffff8002ff030a 
0xffffff81f040bf60 : 0xffffff800303e157 
0xffffff81f040bfb0 : 0xffffff8002cf3868 
      Kernel Extensions in backtrace:
         net.lundman.spl(1.0)[960038E4-E637-381E-8A80-C991446CEDCA]@0xffffff7f83216000->0xffffff7f83226fff
         net.lundman.zfs(1.0)[2622D0A9-0601-3E37-BA96-7861D15B0CBE]@0xffffff7f83227000->0xffffff7f83432fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7f831e8000
            dependency: net.lundman.spl(1.0.0)[960038E4-E637-381E-8A80-C991446CEDCA]@0xffffff7f83216000

BSD process name corresponding to current thread: Google Chrome Ca
Boot args: -v darkwake=0
rottegift commented 10 years ago

Ah some are in range after all...

0xffffff8002c22fa9
list_remove (in spl) (spl-list.c:111)
tsd_set (in spl) (spl-tsd.c:88)
arc_hdr_destroy (in zfs) (arc.c:1879)
arc_read (in zfs) (arc.c:3674)
zfs_remove (in zfs) (zfs_vnops.c:2072)
ddi_strtoul (in zfs) (sunddi.h:89)
zil_open (in zfs) (zil.c:1880)
0xffffff8002dfdb31[zfs_log_write as above]

[sorry if i'm making errors, i'm dead tired]

lundman commented 10 years ago
    if ((fsync_cnt = (uintptr_t)tsd_get(zfs_fsyncer_key)) != 0) {
        (void) tsd_set(zfs_fsyncer_key, (void *)(fsync_cnt - 1));
    }

yeah tsd_set() is brand spanking new, I wrote it last week while drunk. So entirely possible it is wrong. You must be on master.

I think now the tsd code is wrong, I remove nodes/values from the list at tsd_set( ,NULL) when I should keep them all around, until tsd_destroy() is called, then free.

lundman commented 10 years ago

https://github.com/openzfsonosx/spl/commit/8c338e5e90d20888966c3dd680e58a742e042526

rottegift commented 10 years ago

I put master in the subject, probably should have put it in the text too. I'm likely always on master when reporting a bug.

Also FWIW I think that the Kernel Flags in /Library/Preferences/SystemConfiguration/com.apple.Boot.plist were overriding what I put in via manual nvram(1). The mtime of the plist file was in 2012, which is too old to give me any hint about what might have caused it to be there.

I'll try your patch in 30ish minutes.

lundman commented 10 years ago

Yeah, it was just my deduction that you were on master, not a question. it was redundant :)

I believe the darkwake is set by hackintoshes, because with it on, they fail to wake from sleep. I assume you have a hackintosh, so you don't have nvram, and need to set it in the chameleon options.

if you don't have a hackintosh, then I have no idea, not heard of the option before today.

rottegift commented 10 years ago

Nope, no hackintosh, it was probably a leftover from when this system was on an MBP with WOL vs sleep problems (it's now on a Mac Mini).

The problem wasn't so much that darkwake=0 was being set, but that it was unsetting keepsyms=y.

rottegift commented 10 years ago

Can no longer reproduce. Thanks!