rjkroege / edwood

Go version of Plan9 Acme Editor
Other
380 stars 34 forks source link

Segfault on Windows 11 in USER32!GetMessageW #507

Open sqweek opened 3 weeks ago

sqweek commented 3 weeks ago

I'm seeing a regular segfault on Win11. It seems to happen when edwood does not have focus and the mouse pointer passes over it, but I don't have a solid reproducer yet.

I'm not really expecting any help solving this one, I'm mainly filing the issue so I have a place to keep notes. From what I've gathered so far it looks like this might actually be an exp/shiny bug, however I couldn't find an issue tracker for that project.

Without further ado, the crash looks like this from gdb's perspective (I also need to look into delve):

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ffc6ca7540e in USER32!GetMessageW () from C:\WINDOWS\System32\user32.dll

(gdb) bt
#0  0x00007ffc6ca7540e in USER32!GetMessageW () from C:\WINDOWS\System32\user32.dll
#1  0x0000000000f9e07a in runtime.asmstdcall () at C:/env/go122/src/runtime/sys_windows_amd64.s:73
#2  0x0000000000000000 in ?? ()

(gdb) info goroutines
* 1 syscall  runtime.systemstack_switch
* 29 syscall  runtime.systemstack_switch
«snip 27 additional goroutines in runtime.gopark»

(gdb) goroutine 1 bt
#0  runtime.cgocall (fn=0xf9e000 <runtime.asmstdcall>, arg=0x134dfc0, ~r0=<optimized out>)
    at C:/env/go122/src/runtime/cgocall.go:164
#1  0x0000000000f995a7 in syscall.SyscallN (trap=<optimized out>, args=Python Exception <class 'gdb.error'>: value has been optimized out
[]uintptr,
    r1=<optimized out>, r1=<optimized out>, r2=<optimized out>, r2=<optimized out>,
    err=<optimized out>, err=<optimized out>) at C:/env/go122/src/runtime/syscall_windows.go:544
#2  0x0000000000f9934a in syscall.Syscall6 (fn=0, nargs=<optimized out>, a1=<optimized out>,
    a2=<optimized out>, a3=6290720, a4=6290616, a5=0, a6=0, r1=<optimized out>,
    r2=<optimized out>, err=<optimized out>) at C:/env/go122/src/runtime/syscall_windows.go:488
#3  0x0000000001089a13 in golang.org/x/exp/shiny/driver/internal/win32._GetMessage (
    msg=<optimized out>, hwnd=0, msgfiltermin=0, msgfiltermax=0, ret=<optimized out>, err=...)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/internal/win32/zsyscall_windows.go:184
#4  0x0000000001088e8b in golang.org/x/exp/shiny/driver/internal/win32.Main (
    f={void (void)} 0xc000137e90, retErr=...)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/internal/win32/win32.go:479
#5  0x000000000108db96 in golang.org/x/exp/shiny/driver/windriver.Main (
    f={void (golang.org/x/exp/shiny/screen.Screen)} 0xc000137ec0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/windriver/windriver.go:22
#6  0x0000000001093790 in golang.org/x/exp/shiny/driver.main (
    f={void (golang.org/x/exp/shiny/screen.Screen)} 0x0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/driver_windows.go:13
#7  golang.org/x/exp/shiny/driver.Main (f={void (golang.org/x/exp/shiny/screen.Screen)} 0x0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/d--Type <RET> for more, q to quit, c to continue without paging--
river.go:24
#8  github.com/ktye/duitdraw.Main (f={void (github.com/ktye/duitdraw.Device *)} 0xc000137ee0)
    at C:/Users/sqweek/go/pkg/mod/github.com/ktye/duitdraw@v0.0.0-20190328070634-a54e9bd5a862/init.go:28
#9  0x00000000010ddf76 in github.com/rjkroege/edwood/draw.Main (
    f={void (github.com/rjkroege/edwood/draw.Device *)} 0xc000137f28)
    at C:/Users/sqweek/go/pkg/mod/github.com/rjkroege/edwood@v0.3.1/draw/duitdraw.go:53
#10 main.main () at C:/Users/sqweek/go/pkg/mod/github.com/rjkroege/edwood@v0.3.1/acme.go:79

Ie. we're in exp/shiny's win32 message loop which looks very standard:

    var m _MSG
    for {
        done, err := _GetMessage(&m, 0, 0, 0)
        if err != nil {
            return fmt.Errorf("win32 GetMessage failed: %v", err)
        }
        if done == 0 { // WM_QUIT
            break
        }
        _TranslateMessage(&m)
        _DispatchMessage(&m)
    }

It's hard to see how edwood could influence this codepath, unless shiny somehow leaks a pointer into the _MSG struct into its event handlers.

Digging deeper into the native side of the core:

(gdb) bt
#0  0x00007ffc6ca7540e in USER32!GetMessageW () from C:\WINDOWS\System32\user32.dll
#1  0x0000000000f9e07a in runtime.asmstdcall () at C:/env/go122/src/runtime/sys_windows_amd64.s:73
#2  0x0000000000000000 in ?? ()

(gdb) p $_siginfo
$1 = {ExceptionCode = ACCESS_VIOLATION, ExceptionFlags = 0, ExceptionRecord = 0x0,
  ExceptionAddress = 0x7ffc6ca7540e <USER32!GetMessageW+62>, NumberParameters = 2, {
    ExceptionInformation = {0, 824637906480, 0 <repeats 13 times>}, AccessViolationInformation = {
      Type = READ_ACCESS_VIOLATION, Address = 0xc0003fde30}}}

(gdb) disass
Dump of assembler code for function USER32!GetMessageW:
   0x00007ffc6ca753d0 <+0>:     mov    %rbx,0x8(%rsp)
   0x00007ffc6ca753d5 <+5>:     push   %rdi
   0x00007ffc6ca753d6 <+6>:     sub    $0x50,%rsp
   0x00007ffc6ca753da <+10>:    mov    %r8d,%eax
   0x00007ffc6ca753dd <+13>:    mov    %rcx,%rbx
   0x00007ffc6ca753e0 <+16>:    or     %r9d,%eax
   0x00007ffc6ca753e3 <+19>:    mov    $0xfffe0000,%ecx
   0x00007ffc6ca753e8 <+24>:    test   %eax,%ecx
   0x00007ffc6ca753ea <+26>:    jne    0x7ffc6ca933ec <USER32!GetWindowTextLengthA+56828>
   0x00007ffc6ca753f0 <+32>:    mov    %rbx,%rcx
   0x00007ffc6ca753f3 <+35>:    rex.W call *0x729b6(%rip)        # 0x7ffc6cae7db0 <gapfnScSendMessage+4944>
   0x00007ffc6ca753fa <+42>:    nopl   0x0(%rax,%rax,1)
   0x00007ffc6ca753ff <+47>:    cmpl   $0x5,0x9449e(%rip)        # 0x7ffc6cb098a4
   0x00007ffc6ca75406 <+54>:    mov    %eax,%edi
   0x00007ffc6ca75408 <+56>:    je     0x7ffc6ca93417 <USER32!GetWindowTextLengthA+56871>
=> 0x00007ffc6ca7540e <+62>:    cmpl   $0x102,0x8(%rbx)
   0x00007ffc6ca75415 <+69>:    je     0x7ffc6ca7542e <USER32!GetMessageW+94>
   0x00007ffc6ca75417 <+71>:    cmpl   $0xcc,0x8(%rbx)
   0x00007ffc6ca7541e <+78>:    je     0x7ffc6ca7542e <USER32!GetMessageW+94>
   0x00007ffc6ca75420 <+80>:    mov    0x60(%rsp),%rbx
   0x00007ffc6ca75425 <+85>:    mov    %edi,%eax
   0x00007ffc6ca75427 <+87>:    add    $0x50,%rsp
   0x00007ffc6ca7542b <+91>:    pop    %rdi
   0x00007ffc6ca7542c <+92>:    ret

(gdb) info register
rax            0x0                 0
rbx            0xc0003fde28        824637906472
rcx            0x7ffc6afa1534      140722103260468
rdx            0x0                 0
rsi            0x5ffd20            6290720
rdi            0x0                 0
rbp            0x5ffe80            0x5ffe80
rsp            0x5ffcc0            0x5ffcc0
r8             0x5ffcb8            6290616
r9             0x0                 0
r10            0x0                 0
r11            0x246               582
r12            0x5ffda0            6290848
r13            0x0                 0
r14            0xc000060000        824634114048
r15            0x7ffffff           134217727
rip            0x7ffc6ca7540e      0x7ffc6ca7540e <USER32!GetMessageW+62>
eflags         0x10293             [ CF AF SF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x53                83
gs             0x2b                43

On the shiny side we have:

func _GetMessage(msg *_MSG, hwnd syscall.Handle, msgfiltermin uint32, msgfiltermax uint32) (ret int32, err error) {
    r0, _, e1 := syscall.Syscall6(procGetMessageW.Addr(), 4, uintptr(unsafe.Pointer(msg)), uintptr(hwnd), uintptr(msgfiltermin), uintptr(msgfiltermax), 0, 0)

which plumbs into the standard library runtime/syscall_windows.go:

//go:linkname syscall_Syscall6 syscall.Syscall6
//go:nosplit
func syscall_Syscall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
    return syscall_SyscallN(fn, a1, a2, a3, a4, a5, a6)
}

//go:linkname syscall_SyscallN syscall.SyscallN
//go:nosplit
func syscall_SyscallN(trap uintptr, args ...uintptr) (r1, r2, err uintptr) {
    nargs := len(args)

    // asmstdcall expects it can access the first 4 arguments
    // to load them into registers.
    var tmp [4]uintptr
    switch {
    case nargs < 4:
        copy(tmp[:], args)
        args = tmp[:]
    case nargs > maxArgs:
        panic("runtime: SyscallN has too many arguments")
    }

    lockOSThread()
    defer unlockOSThread()
    c := &getg().m.syscall
    c.fn = trap
    c.n = uintptr(nargs)
    c.args = uintptr(noescape(unsafe.Pointer(&args[0])))
    cgocall(asmstdcallAddr, unsafe.Pointer(c))
    return c.r1, c.r2, c.err
}

From what I can tell from reading sys_windows_amd64.s and os_windows.go, asmstdcall() puts the first argument into the CX register. According to the disassembly USER32!GetMessageW moves RCX into RBX and then tries to access memory near RBX which is responsible for the segfault. ie. this corresponds to the _MSG struct passed in by shiny. However I've no clue why that would suddenly become inaccessible so I wonder if I've taken a wrong turn somewhere.

Anyway that will do for now, it's past my bedtime ;)

rjkroege commented 3 weeks ago

This is awesome detective work. I don't really have any useful comments to add. I'm running Edwood on Win and have a growing list of stuff that I need to be fixing.

sqweek commented 2 weeks ago

Another observation -- this time I captured some register values at the start of the program (via a breakpoint on USER32!GetMessageW):

(gdb) goroutine 1 bt
#0  runtime.cgocall (fn=0xf9e000 <runtime.asmstdcall>, arg=0x134dfc0, ~r0=<optimized out>)
    at C:/env/go122/src/runtime/cgocall.go:164
#1  0x0000000000f995a7 in syscall.SyscallN (trap=<optimized out>, args=Python Exception <class 'gdb.error'>: value has been optimized out
[]uintptr,
    r1=<optimized out>, r1=<optimized out>, r2=<optimized out>, r2=<optimized out>,
    err=<optimized out>, err=<optimized out>) at C:/env/go122/src/runtime/syscall_windows.go:544
#2  0x0000000000f9934a in syscall.Syscall6 (fn=140722131391440, nargs=<optimized out>,
    a1=<optimized out>, a2=<optimized out>, a3=6290720, a4=0, a5=0, a6=0, r1=<optimized out>,
    r2=<optimized out>, err=<optimized out>) at C:/env/go122/src/runtime/syscall_windows.go:488
#3  0x0000000001089a13 in golang.org/x/exp/shiny/driver/internal/win32._GetMessage (
    msg=<optimized out>, hwnd=0, msgfiltermin=0, msgfiltermax=0, ret=<optimized out>, err=...)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/internal/win32/zsyscall_windows.go:184
#4  0x0000000001088e8b in golang.org/x/exp/shiny/driver/internal/win32.Main (
    f={void (void)} 0xc000171e90, retErr=...)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/internal/win32/win32.go:479
#5  0x000000000108db96 in golang.org/x/exp/shiny/driver/windriver.Main (
    f={void (golang.org/x/exp/shiny/screen.Screen)} 0xc000171ec0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/windriver/windriver.go:22
#6  0x0000000001093790 in golang.org/x/exp/shiny/driver.main (
    f={void (golang.org/x/exp/shiny/screen.Screen)} 0x0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/driver_windows.go:13
#7  golang.org/x/exp/shiny/driver.Main (f={void (golang.org/x/exp/shiny/screen.Screen)} 0x0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/d--Type <RET> for more, q to quit, c to continue without paging--
river.go:24
#8  github.com/ktye/duitdraw.Main (f={void (github.com/ktye/duitdraw.Device *)} 0xc000171ee0)
    at C:/Users/sqweek/go/pkg/mod/github.com/ktye/duitdraw@v0.0.0-20190328070634-a54e9bd5a862/init.go:28
#9  0x00000000010ddf76 in github.com/rjkroege/edwood/draw.Main (
    f={void (github.com/rjkroege/edwood/draw.Device *)} 0xc000171f28)
    at C:/Users/sqweek/go/pkg/mod/github.com/rjkroege/edwood@v0.3.1/draw/duitdraw.go:53
#10 main.main () at C:/Users/sqweek/go/pkg/mod/github.com/rjkroege/edwood@v0.3.1/acme.go:79

(gdb) info register
rax            0x7ffc6ca753d0      140722131391440
rbx            0x134dfc0           20242368
rcx            0xc000171e28        824635235880
rdx            0x0                 0
rsi            0x5ffd20            6290720
rdi            0x5ffd50            6290768
rbp            0x5ffe80            0x5ffe80
rsp            0x5ffcc0            0x5ffcc0
...

Note rcx is the pointer to the message struct in this case as we are at the start of the GetMessageW function; it gets moved into rbx later.

And then at the time of the crash:

(gdb) goroutine 1 bt
#0  runtime.cgocall (fn=0xf9e000 <runtime.asmstdcall>, arg=0x134dfc0, ~r0=<optimized out>)
    at C:/env/go122/src/runtime/cgocall.go:164
#1  0x0000000000f995a7 in syscall.SyscallN (trap=<optimized out>, args=Python Exception <class 'gdb.error'>: value has been optimized out
[]uintptr,
    r1=<optimized out>, r1=<optimized out>, r2=<optimized out>, r2=<optimized out>,
    err=<optimized out>, err=<optimized out>) at C:/env/go122/src/runtime/syscall_windows.go:544
#2  0x0000000000f9934a in syscall.Syscall6 (fn=0, nargs=<optimized out>, a1=<optimized out>,
    a2=<optimized out>, a3=6290720, a4=6290616, a5=0, a6=0, r1=<optimized out>,
    r2=<optimized out>, err=<optimized out>) at C:/env/go122/src/runtime/syscall_windows.go:488
#3  0x0000000001089a13 in golang.org/x/exp/shiny/driver/internal/win32._GetMessage (
    msg=<optimized out>, hwnd=0, msgfiltermin=0, msgfiltermax=0, ret=<optimized out>, err=...)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/internal/win32/zsyscall_windows.go:184
#4  0x0000000001088e8b in golang.org/x/exp/shiny/driver/internal/win32.Main (
    f={void (void)} 0xc000583e90, retErr=...)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/internal/win32/win32.go:479
#5  0x000000000108db96 in golang.org/x/exp/shiny/driver/windriver.Main (
    f={void (golang.org/x/exp/shiny/screen.Screen)} 0xc000583ec0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/windriver/windriver.go:22
#6  0x0000000001093790 in golang.org/x/exp/shiny/driver.main (
    f={void (golang.org/x/exp/shiny/screen.Screen)} 0x0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/driver_windows.go:13
#7  golang.org/x/exp/shiny/driver.Main (f={void (golang.org/x/exp/shiny/screen.Screen)} 0x0)
    at C:/Users/sqweek/go/pkg/mod/golang.org/x/exp@v0.0.0-20190212162250-21964bba6549/shiny/driver/d--Type <RET> for more, q to quit, c to continue without paging--
river.go:24
#8  github.com/ktye/duitdraw.Main (f={void (github.com/ktye/duitdraw.Device *)} 0xc000583ee0)
    at C:/Users/sqweek/go/pkg/mod/github.com/ktye/duitdraw@v0.0.0-20190328070634-a54e9bd5a862/init.go:28
#9  0x00000000010ddf76 in github.com/rjkroege/edwood/draw.Main (
    f={void (github.com/rjkroege/edwood/draw.Device *)} 0xc000583f28)
    at C:/Users/sqweek/go/pkg/mod/github.com/rjkroege/edwood@v0.3.1/draw/duitdraw.go:53
#10 main.main () at C:/Users/sqweek/go/pkg/mod/github.com/rjkroege/edwood@v0.3.1/acme.go:79

 (gdb) p $_siginfo
$2 = {ExceptionCode = ACCESS_VIOLATION, ExceptionFlags = 0, ExceptionRecord = 0x0,
  ExceptionAddress = 0x7ffc6ca7540e <USER32!GetMessageW+62>, NumberParameters = 2, {
    ExceptionInformation = {0, 824638201392, 0 <repeats 13 times>}, AccessViolationInformation = {
      Type = READ_ACCESS_VIOLATION, Address = 0xc000445e30}}}

(gdb) info register
rax            0x0                 0
rbx            0xc000445e28        824638201384
rcx            0x7ffc6afa1534      140722103260468
rdx            0x0                 0
rsi            0x5ffd20            6290720
rdi            0x0                 0
rbp            0x5ffe80            0x5ffe80

rbx is definitely not pointing at the same MSG struct that we started with. Looking at the go stack trace I also notice that the duitdraw.Device and screen.Screen pointers are different which I find a bit surprising -- suggests that multiple event loops are running?

My laptop does have two screens built in and also two GPUs, I wonder if that's somehow related. Will investigate this angle and also delve for go debugging when I have time

sqweek commented 1 week ago

Ok delve doesn't seem useful for debugging segfaults on windows, I just get this:

(dlv) continue
«... time passes ...»
Process 43252 has exited with status 3221225477
(dlv) bt
Process 43252 has exited with status 3221225477
(dlv) goroutine 1
Process 43252 has exited with status 3221225477
(dlv) dump t.core
Dumping memory 0 / 0...
error dumping: Internal debugger error: send on closed channel
runtime.gopanic (0xe6cad1)
        C:/env/go122/src/runtime/panic.go:770
runtime.chansend (0xe39f07)
        C:/env/go122/src/runtime/chan.go:206
runtime.chansend1 (0xe39996)
        C:/env/go122/src/runtime/chan.go:145
github.com/go-delve/delve/pkg/proc/native.(*nativeProcess).execPtraceFunc (0x12d5d1a)
        C:/Users/sqweek/go/pkg/mod/github.com/go-delve/delve@v1.23.0/pkg/proc/native/proc.go:412
github.com/go-delve/delve/pkg/proc/native.(*nativeProcess).MemoryMap (0x12d5ca9)
        C:/Users/sqweek/go/pkg/mod/github.com/go-delve/delve@v1.23.0/pkg/proc/native/dump_windows_amd64.go:16
github.com/go-delve/delve/pkg/proc.(*Target).Dump (0x11fdc69)
        C:/Users/sqweek/go/pkg/mod/github.com/go-delve/delve@v1.23.0/pkg/proc/dump.go:190
github.com/go-delve/delve/service/debugger.(*Debugger).DumpStart.func1 (0x12f2c6b)
        C:/Users/sqweek/go/pkg/mod/github.com/go-delve/delve@v1.23.0/service/debugger/debugger.go:2307
runtime.goexit (0xea0c40)
        C:/env/go122/src/runtime/asm_amd64.s:1695

A quick search suggested that setting a breakpoint when a signal is received is not yet supported by delve.

I thought perhaps I could use gdb to capture the segfault then save a core file for delve to read, but gdb on windows cannot generate core files :S

Ok I've managed to get delve to print out the _MSG struct at every invocation of shiny/driver/internal/win32._GetMessage(), hopefully that will log something useful for the next crash

sqweek commented 1 week ago

hopefully that will log something useful for the next crash

Unfortunately, it did not :( Just some WM_MOUSEMOVE messages:

> goroutine(1): golang.org/x/exp/shiny/driver/internal/win32._GetMessage(("*golang.org/x/exp/shiny/driver/internal/win32._MSG")(0xc000623e28), 0, 0, 0)

        msg: ("*golang.org/x/exp/shiny/driver/internal/win32._MSG")(0xc000623e28)
*golang.org/x/exp/shiny/driver/internal/win32._MSG {
                HWND: 19796938,
                Message: 512,
                Wparam: 0,
                Lparam: 5505729,
                Time: 1796106968,
                Pt: golang.org/x/exp/shiny/driver/internal/win32._POINT {X: 2498, Y: 374},}
> goroutine(1): golang.org/x/exp/shiny/driver/internal/win32._GetMessage(("*golang.org/x/exp/shiny/driver/internal/win32._MSG")(0xc000623e28), 0, 0, 0)

        msg: ("*golang.org/x/exp/shiny/driver/internal/win32._MSG")(0xc000623e28)
*golang.org/x/exp/shiny/driver/internal/win32._MSG {
                HWND: 19796938,
                Message: 512,
                Wparam: 0,
                Lparam: 5178062,
                Time: 1796106968,
                Pt: golang.org/x/exp/shiny/driver/internal/win32._POINT {X: 2511, Y: 369},}
> goroutine(1): golang.org/x/exp/shiny/driver/internal/win32._GetMessage(("*golang.org/x/exp/shiny/driver/internal/win32._MSG")(0xc000623e28), 0, 0, 0)

        msg: ("*golang.org/x/exp/shiny/driver/internal/win32._MSG")(0xc000623e28)
*golang.org/x/exp/shiny/driver/internal/win32._MSG {
                HWND: 19796938,
                Message: 512,
                Wparam: 0,
                Lparam: 4588260,
                Time: 1796106984,
                Pt: golang.org/x/exp/shiny/driver/internal/win32._POINT {X: 2533, Y: 360},}
Process 63676 has exited with status 3221225477

I'm a bit at a loss. I haven't had to move windows between screens or switch between battery/AC power to trigger the crash, sometimes it happens just a few minutes after opening edwood