openzfsonosx / zfs

OpenZFS on OS X
https://openzfsonosx.org/
Other
824 stars 72 forks source link

Panic: "kauth_identity: can't insert record without UID or GID as key" #192

Closed zfsfan closed 10 years ago

zfsfan commented 10 years ago

Summary: A pool corrupting kernel panic in the zfs driver during a volume clone via CCC/rsync to a newly created raidz2, using three new drives.

panic(cpu 2 caller 0xffffff802e1bc1c6): "kauth_identity: can't insert record without UID or GID as key"@/SourceCache/xnu/xnu-2422.100.13/bsd/kern/kern_credential.c:1213 Kernel Extensions in backtrace: net.lundman.zfs(1.0)[0EC79B06-3C9F-3529-8450-42222507F310]@0xffffff7faf94c000->0xffffff7fafb55fff dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7fae4d3000 dependency: net.lundman.spl(1.0.0)[E94395C5-BE95-36B6-BA13-D00853C9D857]@0xffffff7faf937000

Pool status after this destructive panic:

$ sudo zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0 in 0h0m with 0 errors on Sat Jun  7 23:39:36 2014
config:

    NAME        STATE     READ WRITE CKSUM
    tank        DEGRADED     0     0     0
      raidz2-0  DEGRADED     0     0     0
        disk4   FAULTED      0     0     0  corrupted data
        disk3   UNAVAIL      0     0     0  corrupted data
        disk2   ONLINE       0     0     0

errors: No known data errors

Hardware: Using three WD 3TB Red drives --> Addonics Port Multiplier SATA-> eSATA Bridgeboard --> Sonnet Tempo 6G ExpressCard --> Sonnet Echo Expresscard Thunderbolt Adapter --> Mac Mini 2011. Using one jHFS+ 2TB WD Green drive on the same bridge board controller mentioned above.

ZFS pool created using OpenZFS_on_OS_X_1.2.7.dmg:

$ zpool create -f -o ashift=12 -O casesensitivity=insensitive -O normalization=formD -O compression=lz4 tank raidz2 /dev/disk1 /dev/disk2 /dev/disk4
$ zfs create tank/Shared   
$ zfs create tank/Backups
$ zfs create tank/Archive
$ zfs create tank/Media
$ zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
tank          19.8G  2.64T  1.72M  /tank
tank/Archive  19.8G  2.64T  19.8G  /tank/Archive
tank/Backups   677K  2.64T   677K  /tank/Backups
tank/Media     645K  2.64T   645K  /tank/Media
tank/Shared    681K  2.64T   681K  /tank/Shared

Performed a volume copy via Carbon Copy Cloner version 3.5.4. Source: 2TB WD Green. "/Volumes/TimeWarp2" Destination: ZFS raidz2 "tank/Archive". Carbon Copy Cloner Log:

==================== Carbon Copy Cloner v. 3.5.4 (1326): 2014-06-08 00:14:01 -0500 ====================

OS: Version 10.9.3 (Build 13D65)
Architecture:   x86_64
Mac model:  Macmini5,2
Number of CPUs: 4
CPU Speed:  2.50 GHz
Memory: 8 GB
Console user id:    501
CCC euid:   501
Task owner: xxxxxx (501)

Task: Copying selected files
Source: TimeWarp2
    Source path: /Volumes/TimeWarp2
    Mount point: /Volumes/TimeWarp2
    Filesystem: hfs
    Capacity: 2.00 TB
    Used: 1.92 TB
    Available: 75.54 GB
    Mac OS X version: 10.5.8
    UUID: 7B0BB0E8-2E1F-36E6-AA98-52694BACCBEA
    Device ID: /dev/disk1s2
    Device vendor: Unidentified Vendor
    Device model: WDC WD20EADS-00R6B0                     
    Device interface: SATA
    Partition format: IOGUIDPartitionScheme
    Case sensitive: No
    Filesystem owner: 0
    Ownership respected: No

Destination: Archive
    Destination path: /tank/Archive
    Mount point: /tank/Archive
    Filesystem: zfs
    Capacity: 2.92 TB
    Used: 665.09 KB
    Available: 2.92 TB
    Mac OS X version: Mac OS not installed
    UUID: 
    Device ID: tank/Archive
    Device vendor: Unidentified Vendor
    Device model: Unidentified Model
    Device interface: none
    Partition format: IONoPartitionScheme
    Case sensitive: Yes
    Filesystem owner: 0
    Ownership respected: Yes

Settings
    Archive deleted items, owner: xxxxxx
    - Protect root-level items
    Overwrite modified files if the source file is newer
    Limit archive size to 40000 MB
    Archive the Recovery HD volume: No (a Recovery HD volume is not associated with the source volume)

06/08 00:14:02  Preparing...
06/08 00:14:02  Authenticating...
06/08 00:14:05  Gathering information about the source and destination...
06/08 00:14:05  Enabling ownership on "TimeWarp2"...
Jun  8 00:14:06 mii5 com.bombich.ccc[613] <Debug>: VSDB: AdoptVolume: /Volumes/TimeWarp2
Jun  8 00:14:06 mii5 com.bombich.ccc[613] <Debug>: VSDB: Updating mount for /Volumes/TimeWarp2: perm,nosuid,nodev
06/08 00:14:12  "TimeWarp2" has ownership enabled.
06/08 00:14:12  "Archive" is not formatted HFS+, not enabling ownership
06/08 00:14:22  Spotlight state on destination: Unknown
06/08 00:14:22  Archive Manager: Creating folder at /tank/Archive/_CCC Archives
06/08 00:14:22  Archive Manager: Creating folder at /tank/Archive/_CCC Archives/2014-06-08 (June 08) 00-14-22
06/08 00:14:22  Will update files only if they are newer on the source volume.
06/08 00:14:22  Pruning archived content...
06/08 00:14:22  Sparing /tank/Archive/_CCC Archives/2014-06-08 (June 08) 00-14-22 [Running total size: 0 bytes]
06/08 00:14:22  Nothing to prune...
06/08 00:14:22  Initiating synchronization engine...
DEBUG: [dfcc: sender] Effective UID is 0 for /Volumes/TimeWarp2/
DEBUG: [sender] Fileflags mask for /Volumes/TimeWarp2/: 0
06/08 00:14:22  receiver: Disabling HFS compression support, /tank/Archive doesn't support it (use --protect-decmpfs to force protection of the com.apple.decmpfs extended attribute). (10001)
DEBUG: [dfcc: sender] Setting effective UID back to 0 for source
DEBUG: [dfcc: receiver] Effective UID is 0 for /tank/Archive
DEBUG: [receiver] Fileflags mask for /tank/Archive: 32
DEBUG: Max xattr size for the destination filesystem is 131072 bytes
DEBUG: [dfcc: receiver] Setting effective UID back to 0 for dest
06/08 00:14:23  Cloning...
06/08 00:33:53  System just restarted, will wait for 82 seconds to let the system settle down...

Panic log reported 15 minutes after the copy started, and reboot:

Sun Jun  8 00:33:53 2014
panic(cpu 2 caller 0xffffff802e1bc1c6): "kauth_identity: can't insert record without UID or GID as key"@/SourceCache/xnu/xnu-2422.100.13/bsd/kern/kern_credential.c:1213
Backtrace (CPU 2), Frame : Return Address
0xffffff820d233540 : 0xffffff802de22fa9 
0xffffff820d2335c0 : 0xffffff802e1bc1c6 
0xffffff820d2335f0 : 0xffffff802e1b8a76 
0xffffff820d2338e0 : 0xffffff802e1b91a4 
0xffffff820d233900 : 0xffffff7fafa37e72 
0xffffff820d233990 : 0xffffff7fafa31b6b 
0xffffff820d2339d0 : 0xffffff802dffc1d7 
0xffffff820d233a50 : 0xffffff802dff5275 
0xffffff820d233bf0 : 0xffffff802dff149a 
0xffffff820d233d70 : 0xffffff802dfe9487 
0xffffff820d233f50 : 0xffffff802e240653 
0xffffff820d233fb0 : 0xffffff802def3c56 
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.0)[0EC79B06-3C9F-3529-8450-42222507F310]@0xffffff7faf94c000->0xffffff7fafb55fff
            dependency: com.apple.iokit.IOStorageFamily(1.9)[9B09B065-7F11-3241-B194-B72E5C23548B]@0xffffff7fae4d3000
            dependency: net.lundman.spl(1.0.0)[E94395C5-BE95-36B6-BA13-D00853C9D857]@0xffffff7faf937000

BSD process name corresponding to current thread: rsync

Mac OS version:
13D65

Kernel version:
Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64
Kernel UUID: ADD73AE6-88B0-32FB-A8BB-4F7C8BE4092E
Kernel slide:     0x000000002dc00000
Kernel text base: 0xffffff802de00000
System model name: Macmini5,2 (Mac-4BC72D62AD45599E)

System uptime in nanoseconds: 2411654067652
last loaded kext at 1543854471072: com.apple.driver.AppleIntelMCEReporter   104 (addr 0xffffff7fae66d000, size 49152)
last unloaded kext at 1605218522708: com.apple.driver.AppleIntelMCEReporter 104 (addr 0xffffff7fae66d000, size 32768)
loaded kexts:
net.lundman.zfs 1.0.0
net.lundman.spl 1.0.0
com.Cycling74.driver.Soundflower    1.5.1
at.obdev.nke.LittleSnitch   4050
com.apple.driver.AppleHWSensor  1.9.5d0
com.apple.filesystems.autofs    3.0
com.apple.driver.AudioAUUC  1.60
com.apple.iokit.IOUserEthernet  1.0.0d1
com.apple.iokit.IOBluetoothSerialManager    4.2.4f1
com.apple.driver.AppleUpstreamUserClient    3.5.13
com.apple.driver.AppleMikeyHIDDriver    124
com.apple.driver.AppleMCCSControl   1.1.12
com.apple.driver.ApplePlatformEnabler   2.0.9d1
com.apple.driver.AGPM   100.14.15
com.apple.driver.AppleMikeyDriver   2.6.1f2
com.apple.Dont_Steal_Mac_OS_X   7.0.0
com.apple.kext.AMDFramebuffer   1.2.2
com.apple.driver.AppleIntelHD3000Graphics   8.2.4
com.apple.driver.AppleHDA   2.6.1f2
com.apple.driver.AppleHWAccess  1
com.apple.AMDRadeonX3000    1.2.2
com.apple.driver.AppleThunderboltIP 1.1.2
com.apple.iokit.BroadcomBluetoothHostControllerUSBTransport 4.2.4f1
com.apple.driver.AppleSMCPDRC   1.0.0
com.apple.driver.ACPI_SMC_PlatformPlugin    1.0.0
com.apple.driver.AppleLPC   1.7.0
com.apple.kext.AMD6000Controller    1.2.2
com.apple.driver.AppleIntelSNBGraphicsFB    8.2.4
com.apple.iokit.SCSITaskUserClient  3.6.6
com.apple.driver.XsanFilter 404
com.apple.driver.AppleIRController  325.7
com.apple.iokit.IOAHCIBlockStorage  2.5.1
com.apple.BootCache 35
com.apple.AppleFSCompression.AppleFSCompressionTypeZlib 1.0.0d1
com.apple.AppleFSCompression.AppleFSCompressionTypeDataless 1.0.0d1
com.apple.driver.AppleSDXC  1.5.2
com.apple.iokit.AppleBCM5701Ethernet    3.8.1b2
com.apple.driver.AirPort.Brcm4331   700.20.22
com.apple.driver.AppleFWOHCI    5.0.2
com.apple.driver.AppleUSBHub    666.4.0
com.apple.driver.AppleAHCIPort  3.0.0
com.apple.driver.AppleUSBEHCI   660.4.0
com.apple.driver.AppleACPIButtons   2.0
com.apple.driver.AppleRTC   2.0
com.apple.driver.AppleHPET  1.8
com.apple.driver.AppleSMBIOS    2.1
com.apple.driver.AppleACPIEC    2.0
com.apple.driver.AppleAPIC  1.7
com.apple.driver.AppleIntelCPUPowerManagementClient 217.92.1
com.apple.security.quarantine   3
com.apple.nke.applicationfirewall   153
com.apple.driver.AppleIntelCPUPowerManagement   217.92.1
com.apple.kext.triggers 1.0
com.apple.iokit.IOSurface   91.1
com.apple.iokit.IOBluetoothFamily   4.2.4f1
com.apple.driver.DspFuncLib 2.6.1f2
com.apple.vecLib.kext   1.0.0
com.apple.iokit.IOAudioFamily   1.9.7fc2
com.apple.kext.OSvKernDSPLib    1.14
com.apple.driver.AppleSMBusController   1.0.11d1
com.apple.iokit.IOAcceleratorFamily 98.20
com.apple.iokit.IOBluetoothHostControllerUSBTransport   4.2.4f1
com.apple.iokit.IOSerialFamily  10.0.7
com.apple.iokit.IOFireWireIP    2.2.6
com.apple.driver.AppleHDAController 2.6.1f2
com.apple.iokit.IOHDAFamily 2.6.1f2
com.apple.iokit.IONDRVSupport   2.4.1
com.apple.driver.AppleSMC   3.1.8
com.apple.driver.IOPlatformPluginLegacy 1.0.0
com.apple.driver.AppleSMBusPCI  1.0.12d1
com.apple.driver.IOPlatformPluginFamily 5.7.0d11
com.apple.kext.AMDSupport   1.2.2
com.apple.AppleGraphicsDeviceControl    3.5.26
com.apple.iokit.IOGraphicsFamily    2.4.1
com.apple.driver.AppleUSBMergeNub   650.4.0
com.apple.iokit.IOSCSIBlockCommandsDevice   3.6.6
com.apple.driver.AppleThunderboltDPInAdapter    3.1.7
com.apple.driver.AppleThunderboltDPOutAdapter   3.1.7
com.apple.driver.AppleThunderboltDPAdapterFamily    3.1.7
com.apple.driver.AppleThunderboltPCIUpAdapter   1.4.5
com.apple.driver.AppleThunderboltPCIDownAdapter 1.4.5
com.apple.iokit.IOUSBMassStorageClass   3.6.0
com.apple.iokit.IOSCSIArchitectureModelFamily   3.6.6
com.apple.iokit.IOUSBHIDDriver  660.4.0
com.apple.driver.AppleUSBComposite  656.4.1
com.apple.driver.AppleThunderboltNHI    2.0.1
com.apple.iokit.IOThunderboltFamily 3.2.7
com.apple.iokit.IOEthernetAVBController 1.0.3b4
com.apple.driver.mDNSOffloadUserClient  1.0.1b5
com.apple.iokit.IO80211Family   630.35
com.apple.iokit.IONetworkingFamily  3.2
com.apple.iokit.IOFireWireFamily    4.5.5
com.apple.iokit.IOUSBUserClient 660.4.2
com.apple.driver.AppleEFINVRAM  2.0
com.apple.iokit.IOAHCIFamily    2.6.5
com.apple.iokit.IOUSBFamily 677.4.0
com.apple.driver.AppleEFIRuntime    2.0
com.apple.iokit.IOHIDFamily 2.0.0
com.apple.iokit.IOSMBusFamily   1.1
com.apple.security.TMSafetyNet  7
com.apple.security.sandbox  278.11
com.apple.kext.AppleMatch   1.0.0d1
com.apple.iokit.IOReportFamily  23
com.apple.driver.DiskImages 371.1
com.apple.iokit.IOStorageFamily 1.9
com.apple.driver.AppleKeyStore  2
com.apple.driver.AppleFDEKeyStore   28.30
com.apple.driver.AppleACPIPlatform  2.0
com.apple.iokit.IOPCIFamily 2.9
com.apple.iokit.IOACPIFamily    1.4
com.apple.kec.pthread   1
com.apple.kec.corecrypto    1.0

System Profile:
Model: Macmini5,2, BootROM MM51.0077.B10, 2 processors, Intel Core i5, 2.5 GHz, 8 GB, SMC 1.75f0
Graphics: AMD Radeon HD 6630M, AMD Radeon HD 6630M, PCIe, 256 MB
Memory Module: BANK 0/DIMM0, 4 GB, DDR3, 1333 MHz, 0x029E, 0x434D534F344758334D314131333333433920
Memory Module: BANK 1/DIMM0, 4 GB, DDR3, 1333 MHz, 0x029E, 0x434D534F344758334D314131333333433920
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0xE4), Broadcom BCM43xx 1.0 (5.106.98.100.22)
Bluetooth: Version 4.2.4f1 13674, 3 services, 23 devices, 1 incoming serial ports
Network Service: Ethernet, Ethernet, en0
PCI Card: pci1b21,612, AHCI Controller, Thunderbolt@195,0,0
Serial ATA Device: OCZ-VERTEX3, 120.03 GB
Serial ATA Device: spsata_pm_name
USB Device: Hub
USB Device: Dell Multimedia Pro Keyboard Hub
USB Device: Microsoft 5-Button Mouse with IntelliEye(TM)
USB Device: Dell Multimedia Pro Keyboard
USB Device: IR Receiver
USB Device: Hub
USB Device: My Book 1130
USB Device: BRCM20702 Hub
USB Device: Bluetooth USB Host Controller
Thunderbolt Bus: Mac mini, Apple Inc., 22.2
Thunderbolt Device: Echo ExpressCard/34 TB, Sonnet Technologies, Inc., 3, 9.2
Model: Macmini5,2, BootROM MM51.0077.B10, 2 processors, Intel Core i5, 2.5 GHz, 8 GB, SMC 1.75f0
Graphics: AMD Radeon HD 6630M, AMD Radeon HD 6630M, PCIe, 256 MB
Memory Module: BANK 0/DIMM0, 4 GB, DDR3, 1333 MHz, 0x029E, 0x434D534F344758334D314131333333433920
Memory Module: BANK 1/DIMM0, 4 GB, DDR3, 1333 MHz, 0x029E, 0x434D534F344758334D314131333333433920
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0xE4), Broadcom BCM43xx 1.0 (5.106.98.100.22)
Bluetooth: Version 4.2.4f1 13674, 3 services, 23 devices, 1 incoming serial ports
Network Service: Ethernet, Ethernet, en0
PCI Card: pci1b21,612, AHCI Controller, Thunderbolt@195,0,0
Serial ATA Device: OCZ-VERTEX3, 120.03 GB
Serial ATA Device: spsata_pm_name
USB Device: Hub
USB Device: My Book 1130
USB Device: BRCM20702 Hub
USB Device: Bluetooth USB Host Controller
USB Device: Hub
USB Device: Dell Multimedia Pro Keyboard Hub
USB Device: Microsoft 5-Button Mouse with IntelliEye(TM)
USB Device: Dell Multimedia Pro Keyboard
USB Device: IR Receiver
Thunderbolt Bus: Mac mini, Apple Inc., 22.2
Thunderbolt Device: Echo ExpressCard/34 TB, Sonnet Technologies, Inc., 3, 9.2
ilovezfs commented 10 years ago

Your disks are probably just renumbered, and the panic is unrelated.

sudo zpool export tank
sudo zpool import tank

On a side note, why would you use raidz2 with three disks instead of a 3-way mirror?

zfsfan commented 10 years ago

The panic looks like a legitimate bug.

Back to the corruption/failure state.. Yes, it appears my drives were remapped when the reboot occurred. However, I encountered additional issues with attempting to resolve the faulted states on disk3 and disk4. Taking the disks offline, then using 'gpt destroy ' followed by a "restore" and "online" of the same disk keeps returning the drive to the "FAILED" state, instead of being shown as a brand new drive.

Regarding raidz2, I chose raidz2 to test its functionality with the drives I have available.

Is ZFS raidz2 using parity for only detecting errors to place a drive in a failed state, or does it go the extra step and use the parity information to repair a failed read and place them into a degraded state? Does ZFS "repair" the failed read by attempting a store of the fixed data back to the drive? Does ZFS put the drive into a failed state or a degraded state in this scenario?

Seems that ZFS raidz2 could use the parity to survive errors on the remaining two disks, whereas a mirror would have no fallback for correcting errors on the remaining two disks after one fails. Correct?

lundman commented 10 years ago
0xffffff802e1b91a4
zfs_getattr_znode_unlocked (in zfs) (zfs_vnops_osx_lib.c:350)
zfs_vnop_access (in zfs) (zfs_vnops_osx.c:313)
0xffffff802dffc1d7
lundman commented 10 years ago

https://github.com/openzfsonosx/zfs/commit/73868940d36fb4a31dacd4826ed0d6cf7f15d445

I know we added it in the attempt to fix spotlight (return identical information to hfs+). But since I don't fully understand the API, it might be safer to remove it. We should check for regression though, in case something in Finder really needs that.

ilovezfs commented 10 years ago

@zfsfan There was no need for you to be doing 'zpool offline' or 'gpt destroy' or any of the rest of that. All that was needed was zpool export tank && zpool import tank. If there had been a legitimate need to run gpt destroy, and resilver a device, the correct command is 'zpool replace' to trigger resilvering. Simply onlining a device after destroying the partition table, will not trigger resilvering.

Any kind of redundancy, whether copies = n (with n >= 2), n-way mirrors, or raidz/raidz2/raidz3, will be used to repair incorrect checksums automatically and during scrub.

With three devices, other than experimentation, there is absolutely no reason to use raidz2, which will always consume two devices worth of space in a vdev for parity information, leaving you with n - 2 devices worth of space for data. So when n = 3, n - 2 = 1, meaning you will have one device worth of space available for data. A three-way mirror would also have a data capacity equal to the space of one device, so using raidz2 instead of three-way mirror provides no space advantage, and will only cause resilvering times to much longer than they are with n-way mirrors. Furthermore, raidz2 will provide no additional protection over a three-way mirror. You can lose up to two devices in a raidz2 just as you can lose up to two devices in a three-way mirror.

https://blogs.oracle.com/relling/entry/zfs_raid_recommendations_space_performance https://blogs.oracle.com/roch/entry/when_to_and_not_to http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance http://constantin.glez.de/blog/2010/01/home-server-raid-greed-and-why-mirroring-still-best