openzfsonwindows / ZFSin

OpenZFS on Windows port
https://openzfsonwindows.org
1.2k stars 68 forks source link

bug check #347

Open datacore-rm opened 2 years ago

datacore-rm commented 2 years ago

Have you come across this issue (may be in openzfs)? The source abd is ABD_FLAG_LINEAR but the abd_u.abd_linear.abd_buf( and hence iter_mapaddr) is NULL.

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e) This is a very common bugcheck. Usually the exception address pinpoints the driver/function that caused the problem. Always note this address as well as the link date of the driver/image that contains this address. Arguments: Arg1: ffffffffc0000005, The exception code that was not handled Arg2: fffff80d76ce7f40, The address that the exception occurred at Arg3: ffff9f805add2fb8, Exception Record Address Arg4: ffff9f805add2800, Context Record Address

Debugging Details:

Page 5c00 not present in the dump file. Type ".hh dbgerr004" for details Page 5c00 not present in the dump file. Type ".hh dbgerr004" for details Page 5c00 not present in the dump file. Type ".hh dbgerr004" for details Page 5c00 not present in the dump file. Type ".hh dbgerr004" for details Page 5c00 not present in the dump file. Type ".hh dbgerr004" for details Page 5c00 not present in the dump file. Type ".hh dbgerr004" for details

KEY_VALUES_STRING: 1

Key  : AV.Dereference
Value: NullClassPtr

Key  : AV.Fault
Value: Read

Key  : Analysis.CPU.Sec
Value: 1

Key  : Analysis.DebugAnalysisProvider.CPP
Value: Create: 8007007e on RM-LT1

Key  : Analysis.DebugData
Value: CreateObject

Key  : Analysis.DebugModel
Value: CreateObject

Key  : Analysis.Elapsed.Sec
Value: 1

Key  : Analysis.Memory.CommitPeak.Mb
Value: 116

Key  : Analysis.System
Value: CreateObject

BUGCHECK_CODE: 7e

BUGCHECK_P1: ffffffffc0000005

BUGCHECK_P2: fffff80d76ce7f40

BUGCHECK_P3: ffff9f805add2fb8

BUGCHECK_P4: ffff9f805add2800

EXCEPTION_RECORD: ffff9f805add2fb8 -- (.exr 0xffff9f805add2fb8) ExceptionAddress: fffff80d76ce7f40 (ZFSin!memcpy+0x0000000000000240) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 0000000000000000 Parameter[1]: 00000000000003f0 Attempt to read from address 00000000000003f0

CONTEXT: ffff9f805add2800 -- (.cxr 0xffff9f805add2800) rax=ffffd48cf9fa2c00 rbx=0000000000001000 rcx=ffffd48cf9fa2c00 rdx=00002b730605d800 rsi=0000000000000000 rdi=0000000000000400 rip=fffff80d76ce7f40 rsp=ffff9f805add31f8 rbp=ffff9f805add32b0 r8=0000000000000000 r9=0000000000000020 r10=0000000000000400 r11=ffffd48cf9fa2800 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000040000 iopl=0 nv up ei pl zr na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010246 ZFSin!memcpy+0x240: fffff80d76ce7f40 f30f6f4411f0 movdqu xmm0,xmmword ptr [rcx+rdx-10h] ds:002b:00000000000003f0=???????????????????????????????? Resetting default scope

PROCESS_NAME: System

READ_ADDRESS: 00000000000003f0

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

EXCEPTION_CODE_STR: c0000005

EXCEPTION_PARAMETER1: 0000000000000000

EXCEPTION_PARAMETER2: 00000000000003f0

EXCEPTION_STR: 0xc0000005

STACK_TEXT:
ffff9f805add31f8 fffff80d76bfd579 : 0000000000000001 0000000000000000 ffff9f804c9e6180 0000000000000000 : ZFSin!memcpy+0x240 [minkernel\crts\crtw32\string\amd64\memcpy.asm @ 342] ffff9f805add3200 fffff80d76bfde8c : 0000000000000001 fffff80d76cdea5a 0000000000000001 fffff801b970123c : ZFSin!abd_copy_off_cb+0x9 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\abd.c @ 1231] ffff9f805add3230 fffff80d76bfd561 : ffffd48bf00647f8 ffffd48e5162d040 ffffd489d2b40610 fffff80d76cdead4 : ZFSin!abd_iterate_func2+0xf8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\abd.c @ 1212] ffff9f805add32e0 fffff80d76c89b14 : ffffd48bc236e540 ffffd48dabb80050 ffffd489d2b40610 ffffd48bff9860a0 : ZFSin!abd_copy_off+0x25 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\abd.c @ 1251] ffff9f805add3330 fffff80d76c8a2bc : ffffd489d2b3ff18 0000000000000000 ffffd48be14f1f00 ffffd489d2b3ffa0 : ZFSin!vdev_queue_aggregate+0x3f8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\vdev_queue.c @ 753] ffff9f805add33c0 fffff80d76c8a181 : ffffd48c5165da68 ffffd48c0fe04861 0000000000000000 ffffd489d2b3ff18 : ZFSin!vdev_queue_io_to_issue+0xc0 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\vdev_queue.c @ 796] ffff9f805add3410 fffff80d76cd8d33 : ffffd48c5165da68 fffff80d76d58240 fffff80d76d58240 ffffd489d2b3d000 : ZFSin!vdev_queue_io_done+0xb5 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\vdev_queue.c @ 891] ffff9f805add3440 fffff80d76cda5b8 : ffffd48c5165da68 0000000000000001 0000000000200000 fffff80d76bae64b : ZFSin!zio_vdev_io_done+0x83 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\zio.c @ 3593] ffff9f805add3470 fffff80d76bb06e8 : ffffd489d21b6700 002318ac02263d58 fffff80d76bb04d0 fffff801b9aba3cc : ZFSin!__zio_execute+0x1a8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\zio.c @ 1997] ffff9f805add34e0 fffff801b97a321d : ffffd48a00000007 ffffd48a2f200800 0000000000000080 fffff80d76bb04d0 : ZFSin!taskq_thread+0x218 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\spl\module\spl\spl-taskq.c @ 1612] ffff9f805add3550 fffff801b97fd886 : ffff9f804c700180 ffffd48a2f200800 fffff801b97a31dc ffffb28157056bb0 : nt!PspSystemThreadStartup+0x41 ffff9f805add35a0 0000000000000000 : ffff9f805add4000 ffff9f805adcd000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x16

FAULTING_SOURCE_LINE: minkernel\crts\crtw32\string\amd64\memcpy.asm

FAULTING_SOURCE_FILE: minkernel\crts\crtw32\string\amd64\memcpy.asm

FAULTING_SOURCE_LINE_NUMBER: 342

FAULTING_SOURCE_CODE:
No source found for 'minkernel\crts\crtw32\string\amd64\memcpy.asm'

SYMBOL_NAME: ZFSin!memcpy+240

MODULE_NAME: ZFSin

IMAGE_NAME: ZFSin.sys

STACK_COMMAND: .cxr 0xffff9f805add2800 ; kb

BUCKET_ID_FUNC_OFFSET: 240

FAILURE_BUCKET_ID: AV_ZFSin!memcpy

OS_VERSION: 10.0.14393.4651

BUILDLAB_STR: rs1_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {1c87be51-1431-220e-9e18-388a2f2220e9}

Followup: MachineOwner

26: kd> k

Child-SP RetAddr Call Site

00 ffff9f805add1fe8 fffff801b980dd55 nt!KeBugCheckEx 01 ffff9f805add1ff0 fffff801b97e5b8f nt!PspSystemThreadStartup$filt$0+0x44 02 ffff9f805add2030 fffff801b97fe99d nt!_C_specific_handler+0x9f 03 ffff9f805add20a0 fffff801b974d551 nt!RtlpExecuteHandlerForException+0xd 04 ffff9f805add20d0 fffff801b974c374 nt!RtlDispatchException+0x421 05 ffff9f805add27d0 fffff801b9806702 nt!KiDispatchException+0x1e4 06 ffff9f805add2e80 fffff801b9803746 nt!KiExceptionDispatch+0xc2 07 ffff9f805add3060 fffff80d76ce7f40 nt!KiPageFault+0x406 08 ffff9f805add31f8 fffff80d76bfd579 ZFSin!memcpy+0x240 [minkernel\crts\crtw32\string\amd64\memcpy.asm @ 342] 09 ffff9f805add3200 fffff80d76bfde8c ZFSin!abd_copy_off_cb+0x9 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\abd.c @ 1231] 0a ffff9f805add3230 fffff80d76bfd561 ZFSin!abd_iterate_func2+0xf8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\abd.c @ 1212] 0b ffff9f805add32e0 fffff80d76c89b14 ZFSin!abd_copy_off+0x25 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\abd.c @ 1251] 0c ffff9f805add3330 fffff80d76c8a2bc ZFSin!vdev_queue_aggregate+0x3f8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\vdev_queue.c @ 753] 0d ffff9f805add33c0 fffff80d76c8a181 ZFSin!vdev_queue_io_to_issue+0xc0 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\vdev_queue.c @ 796] 0e ffff9f805add3410 fffff80d76cd8d33 ZFSin!vdev_queue_io_done+0xb5 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\vdev_queue.c @ 891] 0f ffff9f805add3440 fffff80d76cda5b8 ZFSin!zio_vdev_io_done+0x83 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\zio.c @ 3593] 10 ffff9f805add3470 fffff80d76bb06e8 ZFSin!__zio_execute+0x1a8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\zio.c @ 1997] 11 ffff9f805add34e0 fffff801b97a321d ZFSin!taskq_thread+0x218 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\spl\module\spl\spl-taskq.c @ 1612] 12 ffff9f805add3550 fffff801b97fd886 nt!PspSystemThreadStartup+0x41 13 ffff9f805add35a0 0000000000000000 nt!KiStartSystemThread+0x16 26: kd> .frame 0n10;dv /t /v 0a ffff9f805add3230 fffff80d76bfd561 ZFSin!abd_iterate_func2+0xf8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\abd.c @ 1212]

struct abd * dabd = struct abd * sabd = @r15 unsigned int64 doff = 0x40000 @r14 unsigned int64 soff = 0 @rbx unsigned int64 size = 0x1000 ffff9f80`5add3308 * func = 0xfffff80d`76bfd570 ffff9f80`5add3310 void * private = 0x00000000`00000000 @esi int ret = 0n0 ffff9f80`5add3270 struct abd_iter daiter = struct abd_iter ffff9f80`5add3290 struct abd_iter saiter = struct abd_iter unsigned int64 dlen = @rdi unsigned int64 len = 0x400 26: kd> dx -r1 (*((ZFSin!abd_iter *)0xffff9f805add3290)) (*((ZFSin!abd_iter *)0xffff9f805add3290)) [Type: abd_iter] [+0x000] iter_abd : 0xffffd489d222b4b0 [Type: abd *] [+0x008] iter_pos : 0x0 [Type: unsigned __int64] [+0x010] iter_mapaddr : 0x0 [Type: void *] [+0x018] iter_mapsize : 0x1000 [Type: unsigned __int64] 26: kd> dx -r1 ((ZFSin!abd *)0xffffd489d222b4b0) ((ZFSin!abd *)0xffffd489d222b4b0) : 0xffffd489d222b4b0 [Type: abd *] [+0x000] abd_flags : ABD_FLAG_LINEAR (1) [Type: abd_flags] [+0x004] abd_size : 0x1000 [Type: unsigned int] [+0x008] abd_parent : 0x0 [Type: abd *] [+0x010] abd_children [Type: refcount] [+0x018] abd_u [Type: ] 26: kd> dx -r1 (*((ZFSin!abd *)0xffffd489d222b4b0)).abd_u (*((ZFSin!abd *)0xffffd489d222b4b0)).abd_u [Type: ] [+0x000] abd_scatter [Type: abd_scatter] [+0x000] abd_linear [Type: abd_linear] 26: kd> dx -r1 (*((ZFSin!abd_linear *)0xffffd489d222b4c8)) (*((ZFSin!abd_linear *)0xffffd489d222b4c8)) [Type: abd_linear] [+0x000] abd_buf : 0x0 [Type: void *] 26: kd> dx -r1 (*((ZFSin!abd_iter *)0xffff9f805add3270)) (*((ZFSin!abd_iter *)0xffff9f805add3270)) [Type: abd_iter] [+0x000] iter_abd : 0xffffd4803a415080 [Type: abd *] [+0x008] iter_pos : 0x40000 [Type: unsigned __int64] [+0x010] iter_mapaddr : 0xffffd48cf9fa2800 [Type: void *] [+0x018] iter_mapsize : 0x400 [Type: unsigned __int64] 26: kd> dx -r1 ((ZFSin!abd *)0xffffd4803a415080) ((ZFSin!abd *)0xffffd4803a415080) : 0xffffd4803a415080 [Type: abd *] [+0x000] abd_flags : ABD_FLAG_OWNER | ABD_FLAG_META (6) [Type: abd_flags] [+0x004] abd_size : 0x46000 [Type: unsigned int] [+0x008] abd_parent : 0x0 [Type: abd *] [+0x010] abd_children [Type: refcount] [+0x018] abd_u [Type: ] windbg> .open -a fffff80d76bfde8c
lundman commented 2 years ago

Hmm doesn't ring any immediate bells, and this is pre-OS split too, so harder to connect up. Sean spent hours tracking down this problem: https://github.com/openzfs/zfs/commit/c6d1112bf4125e5a22eb47ceb7b8cee01f0df9a1 but that did not lead to NULL ptr, but rather wrong free.

datacore-rm commented 2 years ago

Thanks. In https://github.com/openzfs/zfs/commit/47b408fcb32cdc81e6e22755191d961718afde8c, _"KM_PUSHPAGE->KMSLEEP when allocating chunks from their own arena". On windows, does KM_PUSHPAGE(can use reserve memory) guarantee allocation success like KM_SLEEP? If you can pls help with the usage scenario/diff for both the flags.

lundman commented 2 years ago

KM_PUSHPAGE can return NULL, KM_SLEEP will only return !NULL.

datacore-rm commented 2 years ago

Thanks. We checked the kmem_cache 'cache_alloc_fail' counter for all the caches and value is 0. There is large free system memory available. So this doesn't seem to indicate any memory alloc failure...Anything else we can check?

Below is zio(dio) structure containing NULL abd linear buffer pointer.

26: kd> .frame 0n12;dv /t /v 0c ffff9f805add3330 fffff80d76c8a2bc ZFSin!vdev_queue_aggregate+0x3f8 [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\vdev_queue.c @ 753] @r15 struct vdev_queue * vq = 0xffffd489`d2b3ff18

struct zio * zio = ffff9f80`5add33c8 zio_flag flags = 0n0 (No matching enumerant) struct avl_tree * t = unsigned int64 size = @rdi struct zio * aio = 0xffffd48b`ff9860a0 struct zio * last = @rbp struct zio * nio = 0xffffd48d`abb80050 ffff9f80`5add33d0 struct zio_link * zl = 0xffffd48b`e14f1f00 @r13 unsigned int64 maxgap = 0 struct zio * first = @rbx struct zio * dio = 0xffffd48e`5162d040 struct zio * mandatory = struct zio * nio = int zfs_vdev_aggregation_limit = 26: kd> dx -id 0,0,ffffd489a14b4540 -r1 ((ZFSin!zio *)0xffffd48e5162d040) ((ZFSin!zio *)0xffffd48e5162d040) : 0xffffd48e5162d040 [Type: zio *] [+0x000] io_bookmark [Type: zbookmark_phys] [+0x020] io_prop [Type: zio_prop] [+0x06c] io_type : ZIO_TYPE_WRITE (2) [Type: zio_type] [+0x070] io_child_type : ZIO_CHILD_VDEV (0) [Type: zio_child] [+0x074] io_trim_flags : (0) [Type: trim_flag] [+0x078] io_cmd : 0 [Type: int] [+0x07c] io_priority : ZIO_PRIORITY_ASYNC_WRITE (3) [Type: zio_priority] [+0x080] io_reexecute : 0x0 [Type: unsigned char] [+0x081] io_state [Type: unsigned char [2]] [+0x088] io_txg : 0x6952fa [Type: unsigned __int64] [+0x090] io_spa : 0xffffd489d25e4000 [Type: spa *] [+0x098] io_bp : 0xffffd48ab7f541c0 [Type: blkptr *] [+0x0a0] io_bp_override : 0x0 [Type: blkptr *] [+0x0a8] io_bp_copy [Type: blkptr] [+0x128] io_parent_list [Type: list] [+0x148] io_child_list [Type: list] [+0x168] io_logical : 0xffffd489dc359560 [Type: zio *] [+0x170] io_transform_stack : 0x0 [Type: zio_transform *] [+0x178] io_ready : 0x0 [Type: void (__cdecl*)(zio *)] [+0x180] io_children_ready : 0x0 [Type: void (__cdecl*)(zio *)] [+0x188] io_physdone : 0xfffff80d76c056f0 [Type: void (__cdecl*)(zio *)] [+0x190] io_done : 0xfffff80d76c88f60 [Type: void (__cdecl*)(zio *)] [+0x198] io_private : 0xffffd4805409f348 [Type: void *] [+0x1a0] io_prev_space_delta : 0 [Type: __int64] [+0x1a8] io_bp_orig [Type: blkptr] [+0x228] io_abd : 0xffffd489d222b4b0 [Type: abd *] [+0x230] io_orig_abd : 0xffffd489d222b4b0 [Type: abd *] [+0x238] io_size : 0x1000 [Type: unsigned __int64] [+0x240] io_orig_size : 0x1000 [Type: unsigned __int64] [+0x248] io_lsize : 0x1000 [Type: unsigned __int64] [+0x250] io_vd : 0xffffd489d2b3d000 [Type: vdev *] [+0x258] io_vsd : 0x0 [Type: void *] [+0x260] io_vsd_ops : 0x0 [Type: zio_vsd_ops *] [+0x268] io_metaslab_class : 0xffffd489c1e73940 [Type: metaslab_class *] [+0x270] io_offset : 0x3cfd075000 [Type: unsigned __int64] [+0x278] io_timestamp : 9878751234375000 [Type: __int64] [+0x280] io_queued_timestamp : 9878751234375000 [Type: __int64] [+0x288] io_target_timestamp : 0 [Type: __int64] [+0x290] io_delta : 0 [Type: __int64] [+0x298] io_delay : 0 [Type: __int64] [+0x2a0] io_queue_node [Type: avl_node] [+0x2b8] io_offset_node [Type: avl_node] [+0x2d0] io_alloc_node [Type: avl_node] [+0x2e8] io_alloc_list [Type: zio_alloc_list] [+0x310] io_flags : 1591424 [Type: zio_flag] [+0x314] io_stage : ZIO_STAGE_VDEV_IO_START (1048576) [Type: zio_stage] [+0x318] io_pipeline : ZIO_STAGE_VDEV_IO_START | ZIO_STAGE_VDEV_IO_DONE | ZIO_STAGE_VDEV_IO_ASSESS | ZIO_STAGE_DONE (24117248) [Type: zio_stage] [+0x31c] io_orig_flags : 1065088 [Type: zio_flag] [+0x320] io_orig_stage : ZIO_STAGE_READY (524288) [Type: zio_stage] [+0x324] io_orig_pipeline : ZIO_STAGE_VDEV_IO_START | ZIO_STAGE_VDEV_IO_DONE | ZIO_STAGE_VDEV_IO_ASSESS | ZIO_STAGE_DONE (24117248) [Type: zio_stage] [+0x328] io_pipeline_trace : ZIO_STAGE_OPEN | ZIO_STAGE_VDEV_IO_START (1048577) [Type: zio_stage] [+0x32c] io_error : 0 [Type: int] [+0x330] io_child_error [Type: int [4]] [+0x340] io_children [Type: unsigned __int64 [4][2]] [+0x380] io_child_count : 0x1 [Type: unsigned __int64] [+0x388] io_phys_children : 0x0 [Type: unsigned __int64] [+0x390] io_parent_count : 0x1 [Type: unsigned __int64] [+0x398] io_stall : 0x0 [Type: unsigned __int64 *] [+0x3a0] io_gang_leader : 0x0 [Type: zio *] [+0x3a8] io_gang_tree : 0x0 [Type: zio_gang_node *] [+0x3b0] io_executor : 0xffffd48a1980e800 [Type: void *] [+0x3b8] io_waiter : 0x0 [Type: void *] [+0x3c0] io_lock [Type: kmutex] [+0x3e8] io_cv [Type: cv] [+0x428] io_allocator : 0 [Type: int] [+0x430] io_cksum_report : 0x0 [Type: zio_cksum_report *] [+0x438] io_ena : 0x0 [Type: unsigned __int64] [+0x440] io_tqent [Type: taskq_ent] 26: kd> u fffff80d76c056f0 ZFSin!arc_write_physdone [C:\BuildAgent\work\88cd52027cd63d70\ZFSin\zfs\module\zfs\arc.c @ 6954]: fffff80d`76c056f0 4883ec28 sub rsp,28h fffff80d`76c056f4 488b9198010000 mov rdx,qword ptr [rcx+198h] fffff80d`76c056fb 488b4218 mov rax,qword ptr [rdx+18h] fffff80d`76c056ff 4885c0 test rax,rax fffff80d`76c05702 740d je ZFSin!arc_write_physdone+0x21 (fffff80d`76c05711) fffff80d`76c05704 4c8b02 mov r8,qword ptr [rdx] fffff80d`76c05707 488b5228 mov rdx,qword ptr [rdx+28h] fffff80d`76c0570b ff153fef0f00 call qword ptr [ZFSin!__guard_dispatch_icall_fptr (fffff80d`76d04650)]