lxn / win

A Windows API wrapper package for the Go Programming Language
Other
1.19k stars 313 forks source link

Access Violation exception randomly crashes program #47

Open tmm1 opened 6 years ago

tmm1 commented 6 years ago

Recently I experienced a crash in my lxn/walk based program on two different machines running Windows 10.

Exception 0xc0000005 0x0 0xc042135d20 0x7ffc2cba3f76
PC=0x7ffc2cba3f76
signal arrived during external code execution

syscall.Syscall6(0x7ffc2cba3f50, 0x4, 0xc042135d18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/vagrant/go/src/runtime/syscall_windows.go:174 +0x69
github.com/lxn/win.GetMessage(0xc042135d18, 0x0, 0x0, 0xc000000001)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/win/user32.go:2202 +0x8d
github.com/lxn/walk.(*FormBase).Run(0xc0424dc000, 0xebf9f0)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/walk/form.go:368 +0xe8
main.runMenu(0x0, 0x0)

...
rax     0x0
rbx     0xc042135d18
rcx     0x7ffc2c3c1144
rdi     0x3c6000
rsi     0xc042135c98
rbp     0xc042135c48
rsp     0x200fcb0
r8      0x200fca8
r9      0xc042135c48
r10     0x0
r11     0x246
r12     0xc04223e2c0
r13     0x1
r14     0xc04223e2c8
r15     0x4
rip     0x7ffc2cba3f76
rflags  0x10246
cs      0x33
fs      0x53
gs      0x2b
Exception 0xc0000005 0x0 0xc0424bbd20 0x7ffc17a93f76
PC=0x7ffc17a93f76
signal arrived during external code execution

syscall.Syscall6(0x7ffc17a93f50, 0x4, 0xc0424bbd18, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1, 0x0, ...)
    /home/vagrant/go/src/runtime/syscall_windows.go:174 +0x69
github.com/lxn/win.GetMessage(0xc0424bbd18, 0x0, 0x0, 0xc000000001)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/win/user32.go:2202 +0x8d
github.com/lxn/walk.(*FormBase).Run(0xc04265a000, 0xea8250)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/walk/form.go:368 +0xe8
main.runMenu(0x0, 0x0)

...
rax     0x0
rbx     0xc0424bbd18
rcx     0x7ffc14711144
rdi     0x3a7000
rsi     0xc0424bbc98
rbp     0xc0424bbc48
rsp     0x1fcfcb0
r8      0x1fcfca8
r9      0xc0424bbc48
r10     0x0
r11     0x246
r12     0xc0422466e0
r13     0x1
r14     0xc0422466e8
r15     0x4
rip     0x7ffc17a93f76
rflags  0x10246
cs      0x33
fs      0x53
gs      0x2b
tmm1 commented 6 years ago

Exception 0xc0000005

0xc0000005 = Access Violation, which AFAICT is basically the Windows equivalent of a SEGV

syscall.Syscall6(0x7ffc2cba3f50

Here 0x7ffc2cba3f50 is the address of the GetMessageW syscall.

github.com/lxn/win.GetMessage(0xc042135d18

GetMessage is being called with a *MSG, which is at 0xc042135d18

Exception 0xc0000005 0x0 0xc042135d20 0x7ffc2cba3f76

This exception seems to be triggering at PC 0x7ffc2cba3f76, which is 0x26 bytes inside the GetMessage syscall (0x7ffc2cba3f76 - 0x7ffc2cba3f50 = 0x26). And it seems to happen when accessing 0xc042135d20, which is 0x8 bytes inside the *MSG struct (0xc042135d20 - 0xc042135d18 = 0x8).

The code here looks as follows:

    var msg win.MSG

    for fb.hWnd != 0 {
        switch win.GetMessage(&msg, 0, 0, 0) {

where win.MSG is:

type MSG struct {
    HWnd    HWND
    Message uint32
    WParam  uintptr
    LParam  uintptr
    Time    uint32
    Pt      POINT
}

and win.GetMessage is a simple syscall wrapper.

Since msg is stack allocated, it seems quite confusing that accessing msg.Message is causing a memory access violation.

cc @alexbrainman @kjk does my analysis make sense? Have you seen anything like this before?

kjk commented 6 years ago

@tmm1 Very solid analysis.

If you can reproduce this I would recommend running under windbg (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools) to debug this. Convert debug info to .pdb using https://github.com/kjk/cv2pdb

Windbg would allow you to poke the state of the app when crash happens.

The first thing I would check is if this is called from main thread (main goroutine). If not, all bets are off.

You can also test 32bit vs. 64bit builds. I've seen 32bit be more stable.

tmm1 commented 6 years ago

Thanks. I'm not able to reproduce readily, but will enable WER to capture a dump the next time it happens. I'm fairly certain all the GUI code is locked to the main thread.

alexbrainman commented 6 years ago

Since msg is stack allocated

How do you know that msg is allocated on stack and does not escape to the heap? I do not know what Go version you use, but I would use whatever flags appropriate to see if msg escapes to the heap.

Other than that, I don't see any problems in your reasoning.

Alex

tmm1 commented 6 years ago

How do you know that msg is allocated on stack and does not escape to the heap? I do not know what Go version you use, but I would use whatever flags appropriate to see if msg escapes to the heap.

Oops, I was reading it like C code. I saw these crashes on go1.9.4, but have since upgraded to go1.10.1 to see if I can reproduce again.

You're right that I have no idea whether it is on the heap or the stack. I'll try compiling with gcflags to see what the compiler decides to do with that variable.

Either way, the golang GC should be keeping it alive right? So there must be some sort of heap/stack corruption happening somewhere?

tmm1 commented 6 years ago

I tried go build -gcflags "-m" and didn't see any output relating to msg, so I guess that means the variable stays on the stack?

lxn commented 6 years ago

I have seen my walk apps crashing randomly like this, but haven't been able to find the cause.

tmm1 commented 6 years ago

@lxn If you have crash stacks from any of them, please share as it will help find any commonalities that could point towards the root cause.

kjk commented 6 years ago

Another thing I would recommend is looking into a list of DLLs loaded into process.

On Windows other software likes to inject itself into every running process and sometimes this software is buggy.

If you look at list of dlls loaded into a process, you can see if any .dll looks like it's non-system.

It's a long shot but I've seen this happen.

Chrome goes as far as blocking such injections (see https://www.chromium.org/Home/third-party-developers or https://blog.chromium.org/2017/11/reducing-chrome-crashes-caused-by-third.html)

alexbrainman commented 6 years ago

I tried go build -gcflags "-m" and didn't see any output relating to msg, so I guess that means the variable stays on the stack?

I do not know where your source is, so I tried one of my programs:

$ GOOS=windows go build -o /dev/null -gcflags=-m github.com/alexbrainman/gowingui/uit 2>&1 | grep -w m
./thread.go:47:31: runLoop &m does not escape
./thread.go:58:27: runLoop &m does not escape
./thread.go:59:26: runLoop &m does not escape
$

So my m does not escape, and lives on the stack. All GUI code runs from inside GetMessage - in the callback. Go runtime is not supposed to move stack while in syscall / callback. But, perhaps, Go runtime has a bug. Or perhaps you GUI switches threads and Go runtime gets confused.

Alex

tmm1 commented 5 years ago

I'm still seeing this happen about once a month.

Exception 0xc0000005 0x0 0xc0421e5d50 0x7ff85cda1b2f
PC=0x7ff85cda1b2f
signal arrived during external code execution

syscall.Syscall6(0x7ff85cda1ae0, 0x4, 0xc0421e5d48, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/vagrant/go/src/runtime/syscall_windows.go:184 +0xfc
github.com/lxn/win.GetMessage(0xc0421e5d48, 0x0, 0x0, 0xc000000001)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/win/user32.go:2202 +0x7b
github.com/lxn/walk.(*FormBase).Run(0xc0420df180, 0xf43178)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/walk/form.go:368 +0xd1
main.runMenu(0x0, 0x0)
    /home/vagrant/channels-server-win64/menu_windows.go:134 +0x152
...
rax     0x0
rbx     0xc0421e5d48
rcx     0x7ff859641144
rdi     0x0
rsi     0xc0421e5cc8
rbp     0xc0421e5c78
rsp     0x22bfca0
r8      0x22bfc98
r9      0xc0421e5c78
r10     0x0
r11     0x246
r12     0xc04215c580
r13     0x0
r14     0xc04215c580
r15     0x400
rip     0x7ff85cda1b2f
rflags  0x10293
cs      0x33
fs      0x53
gs      0x2b
Exception 0xc0000005 0x0 0xc0423ddd50 0x7ffc1aed1b2f
PC=0x7ffc1aed1b2f
signal arrived during external code execution

syscall.Syscall6(0x7ffc1aed1ae0, 0x4, 0xc0423ddd48, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/vagrant/go/src/runtime/syscall_windows.go:184 +0xfc
github.com/lxn/win.GetMessage(0xc0423ddd48, 0x0, 0x0, 0xc000000001)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/win/user32.go:2202 +0x7b
github.com/lxn/walk.(*FormBase).Run(0xc0420de700, 0xf43178)
    /home/vagrant/channels-server-win64/.go/src/github.com/lxn/walk/form.go:368 +0xd1
main.runMenu(0x0, 0x0)
    /home/vagrant/channels-server-win64/menu_windows.go:134 +0x152
...
rax     0x0
rbx     0xc0423ddd48
rcx     0x7ffc19921144
rdi     0x0
rsi     0xc0423ddcc8
rbp     0xc0423ddc78
rsp     0x22bfca0
r8      0x22bfc98
r9      0xc0423ddc78
r10     0x0
r11     0x246
r12     0xc0420d51e0
r13     0x0
r14     0xc0420d51e0
r15     0x400
rip     0x7ffc1aed1b2f
rflags  0x10293
cs      0x33
fs      0x53
gs      0x2b

@aclements Do you think any of your fixes for https://github.com/golang/go/issues/21382 would have an effect here?

tmm1 commented 5 years ago

I'm still seeing this happen in go1.12

Exception 0xc0000005 0x0 0xc0005dbd38 0x7ffa2fa52fc8
PC=0x7ffa2fa52fc8
signal arrived during external code execution

syscall.Syscall6(0x7ffa2fa52f90, 0x4, 0xc0005dbd30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/vagrant/go/src/runtime/syscall_windows.go:199 +0xed
github.com/lxn/win.GetMessage(0xc0005dbd30, 0x0, 0x0, 0xc0005dbd30)
    /home/vagrant/gopath/pkg/mod/github.com/tmm1/win@v0.0.0-20171103020345-3413a4ab659b/user32.go:2202 +0x7b
github.com/lxn/walk.(*FormBase).Run(0xc0002ef880, 0x12e4e10)
    /home/vagrant/gopath/pkg/mod/github.com/tmm1/walk@v0.0.0-20171106193327-cfd509bc6417/form.go:368 +0xdd
...
rax     0x0
rbx     0xc0005dbd30
rcx     0x7ffa2ccd1164
rdi     0x0
rsi     0xc0005dbc90
rbp     0xc0005dbc40
rsp     0x329fc70
r8      0x329fc68
r9      0xc0005dbc40
r10     0x0
r11     0x246
r12     0xc00044a6e0
r13     0x80
r14     0x0
r15     0xc00044a6e0
rip     0x7ffa2fa52fc8
rflags  0x10293
cs      0x33
fs      0x53
gs      0x2b
Exception 0xc0000005 0x8 0x7fff8b438071 0x7fff8b438071
PC=0x7fff8b438071
signal arrived during external code execution

syscall.Syscall6(0x7fffab78c1a0, 0x3, 0x4d, 0x101, 0xc00097fb00, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/vagrant/go/src/runtime/syscall_windows.go:199 +0xed
github.com/lxn/win.SHGetStockIconInfo(...)
    /home/vagrant/gopath/pkg/mod/github.com/tmm1/win@v0.0.0-20171103020345-3413a4ab659b/shell32.go:411
...
rax     0x748
rbx     0x329ee90
rcx     0xa701fcfd98a0000
rdi     0x8403890
rsi     0x84038e0
rbp     0x84038f4
rsp     0x329ed80
r8      0x0
r9      0x8000
r10     0x0
r11     0x246
r12     0x329f660
r13     0x8423c40
r14     0x83c8a70
r15     0x10
rip     0x7fff8b438071
rflags  0x10206
cs      0x33
fs      0x53
gs      0x2b
tmm1 commented 5 years ago

Possibly a dupe of https://github.com/lxn/walk/issues/483 and fixed in https://github.com/lxn/walk/pull/493