mupen64plus / mupen64plus-core

Core module of the Mupen64Plus project
1.31k stars 258 forks source link

Segfault when closing games #234

Closed orbea closed 7 years ago

orbea commented 7 years ago

OS: Slackware64-current mupen64plus-2017.02.03_083f825-x86_64-1_git glide64-2016.09.01_56be25d-x86_64-1_git

When closing games mupen64plus will segfault and it does not seem to matter what plugin.

Here is a backtrace with glide64 and hle.

Thread 1 "mupen64plus" received signal SIGSEGV, Segmentation fault.
0x00007ffff56a0904 in dyna_start (code=0xa4300)
    at ../../src/device/r4300/x86_64/rjump.c:57
57    asm volatile
(gdb) bt
#0  0x00007ffff56a0904 in dyna_start (code=0xa4300)
    at ../../src/device/r4300/x86_64/rjump.c:57
#1  0x00007ffff56a08e2 in dyna_start (code=0x7ffff58cf1a0 <g_dev>)
    at ../../src/device/r4300/x86_64/rjump.c:57
#2  0x00007ffff6dd3840 in ?? () from /usr/lib64/libmupen64plus.so.2
#3  0x00007fffffffde90 in ?? ()
#4  0x00007fffffffde50 in ?? ()
#5  0x00007ffff6dd38c0 in rsp () from /usr/lib64/libmupen64plus.so.2
#6  0x00007ffff564c020 in ?? () from /usr/lib64/libmupen64plus.so.2
#7  0x00007ffff58cf1a0 in ?? () from /usr/lib64/libmupen64plus.so.2
#8  0x00007ffff564c24c in run_r4300 () at ../../src/device/r4300/r4300_core.c:166
#9  0x00007ffff5637ae5 in run_device (dev=dev@entry=0x7ffff58cf1a0 <g_dev>)
    at ../../src/device/device.c:90
#10 0x00007ffff565649b in main_run () at ../../src/main/main.c:1026
#11 0x0000000000402050 in main (argc=6, argv=0x7fffffffe1c8)
    at ../../src/main.c:782

GDB log - http://pastebin.com/jzbjVWMp

orbea commented 7 years ago

GLideN64-2017.01.31_bf1e748_master-x86_64-1_git

With GLideN64 it will segfault when starting instead of closing.

Thread 1 "mupen64plus" received signal SIGABRT, Aborted.
0x00007ffff731a5cf in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff731a5cf in raise () from /lib64/libc.so.6
#1  0x00007ffff731c1ba in abort () from /lib64/libc.so.6
#2  0x00007ffff7312b67 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff7312c12 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fffeffd3b40 in _createShader (_type=35633,
    _strShader=0x7ffff0089100 "#version 330 core \nin highp vec4 aPosition;\t\t\t\t\t\t\t\nin lowp vec4 aColor;\t\t\t\t\t\t\t\t\nin highp vec2 aTexCoord0;\t\t\t\t\t\t\t\nin highp vec2 aTexCoord1;\t\t\t\t\t\t\t\nin lowp float aNumLights;\t\t\t\t\t\t\t\nin highp vec4 aModify"...)
    at /tmp/SBo/GLideN64/src/OGL3X/GLSLCombiner_ogl3x.cpp:206
#5  0x00007fffeffd3b61 in InitShaderCombiner ()
    at /tmp/SBo/GLideN64/src/OGL3X/GLSLCombiner_ogl3x.cpp:212
#6  0x00007fffeff82b68 in Combiner_Init ()
    at /tmp/SBo/GLideN64/src/Combiner.cpp:77
#7  0x00007fffeffafc7b in OGLRender::_initData (
    this=0x7ffff06dc5d0 <OGLVideo::get()::video+1104>)
    at /tmp/SBo/GLideN64/src/OpenGL.cpp:2186
#8  0x00007fffeffa7792 in OGLVideo::start (
    this=0x7ffff06dc180 <OGLVideo::get()::video>)
    at /tmp/SBo/GLideN64/src/OpenGL.cpp:160
#9  0x00007fffeffcd3ea in PluginAPI::RomOpen (
    this=0x7ffff06dbb10 <PluginAPI::get()::api>)
    at /tmp/SBo/GLideN64/src/common/CommonAPIImpl_common.cpp:209
#10 0x00007fffeffcd760 in RomOpen ()
    at /tmp/SBo/GLideN64/src/MupenPlusPluginAPI.cpp:8
#11 0x00007ffff56562d3 in main_run () at ../../src/main/main.c:981
#12 0x0000000000402050 in main (argc=6, argv=0x7fffffffe1c8)
    at ../../src/main.c:782

GDB log - http://pastebin.com/3VyQWiTC

orbea commented 7 years ago

The second crash seems to be unrelated... Older commits of GLideN64 will segfault when closed too.

orbea commented 7 years ago

I bisected it and found the first bad commit.

commit ed1c410f17e78709dfc6050fe4ccafb6d7f3c3a9
Author: Bobby Smiles <bobby.smiles32@gmail.com>
Date:   Thu Dec 1 17:11:46 2016 +0100

    Move rjump variables inside r4300_core struct.

:040000 040000 e8c1d77d6d9334f528891b39862715b4b63d892f 5ea9932890561262d73f7c34c1b77d16fb9d5326 M  src

ed1c410f17e78709dfc6050fe4ccafb6d7f3c3a9 @bsmiles32 Can you take a look at this?

bsmiles32 commented 7 years ago

Thanks for bissecting. I don't really see what could cause the segfault here. And I didn't experienced it. I'll take a look, but don't have much time atm.

orbea commented 7 years ago

@bsmiles32 Reverting the changes in src/device/r4300/x86_64/rjump.c works around the segfault. The issue has to be in there.

Gillou68310 commented 7 years ago

@orbea could you try replacing: " sub $0x10, %%rsp \n" by " sub $0x20, %%rsp \n" in rjump.c

I remember experiencing a similar issue in windows (see dyna_start.asm)

orbea commented 7 years ago

I tried that and it still segfaults unfortunately.

Gillou68310 commented 7 years ago

Try increasing the value.

orbea commented 7 years ago

I did, to 50, 100 and 1000, no changes.

Gillou68310 commented 7 years ago

Thanks for testing I just wanted to be sure this was not a stack issue. This might also be a compiler issue when trying to build the intrinsic asm code in rjump.c We could try to build dyna_start.asm instead of rjump.c just to be sure, but I'll need to make some changes to the Makefile for this. In the meantime could you do a objdump -d on mupen64plus-core.so so I can check the disassembly. If you know how to generate the map file this would be helpfull also.

Gillou68310 commented 7 years ago

The map file should be generated when executing the following command before doing a make all: export LDFLAGS="-Wl,-Map=output.map"

orbea commented 7 years ago

I attempted to build it with those LDFLAGS and then got the resulting output from objdump -d, does this help?

objdump -d: http://ks392457.kimsufi.com/orbea/stuff/logs/mupen64plus_objdump.log

output.map: http://pastebin.com/T71qb5NG

Gillou68310 commented 7 years ago

Thanks, I checked the disassembly and everything looks good. The last issue I can think of is memory corruption, you should check that the value of g_dev.r4300.save_rsp and g_dev.r4300.save_rip are not overwritten during execution. Put a breakpoint in dyna start with gdb and step through instructions to get the value of rsp when reaching mov %%rsp, %[save_rsp] and the value of rax when reaching mov %%rax, %[save_rip]. Use the next command when reaching call *%%rbx. gdb should break in mov %[save_rsp], %%rsp when closing emulation, check the value of rsp after the instruction.

orbea commented 7 years ago

I'm not all that experienced at gdb, but i gave it a try. I could of misunderstood something... http://pastebin.com/H4ZkQwUT

Gillou68310 commented 7 years ago

I forgot to say you need to run the following command to show disassembly: set disassemble-next-line

orbea commented 7 years ago

Alright, this is hopefully more useful? http://pastebin.com/NFhpsGHK

Gillou68310 commented 7 years ago

Comment the debug message line to jump directly to the usefull code and use stepi to step to next instruction and nexti to step over a call instruction, you're almost a gdb expert ;-)

orbea commented 7 years ago

I tried that, is this better? http://pastebin.com/4exzJEF4

I tried using nexti on any call instructions and the game started after the last one where I then exited normally and continued with stepi until it segfaulted...

Gillou68310 commented 7 years ago

Oh I see what's happening here, the compiler allocated an additional register "rdx" to store the base address of the g_dev structure or rdx obviously doesn't contain that address anymore when you stop emulation and jump back to the mov 0xa00(%rdx),%rsp instruction. Using dyna_start.asm instead of the intrinsic asm code will fix the issue.

bsmiles32 commented 7 years ago

Nice catch @Gillou68310 ! So maybe putting rdx (and any registers modified inside the call rbx) in the clobber list would fix this ?

Gillou68310 commented 7 years ago

@bsmiles32 I guess it should work too, but honestly I prefer using asm files. I guess it's just a matter of taste but at least I'm sure gcc won't use registers that I didn't explicitly allocated ;-)

Gillou68310 commented 7 years ago

@orbea could you test this branch https://github.com/Gillou68310/mupen64plus-core/tree/dyna_start_x86_64 Updated

bsmiles32 commented 7 years ago

@Gillou68310 Inlined asm has the "benefit" of letting the optimizer do his job, especially register allocation, but as we saw this is a double edged sword. Also inlined asm is GCC only.

Should we also update the x86 version too to use an asm file instead of the inlined asm ?

Gillou68310 commented 7 years ago

Should we also update the x86 version too to use an asm file instead of the inlined asm ?

Yes I think we should, I can take care of it if you want.

orbea commented 7 years ago

@Gillou68310 I tried your branch and there is no segfault, thanks!

Gillou68310 commented 7 years ago

@orbea you're welcome ;-) Thanks for helping me tracking down this issue.

@bsmiles32 I created the asm file for x86 https://github.com/Gillou68310/mupen64plus-core/commits/dyna_start could you check if it builds fine on linux?

orbea commented 7 years ago

I tested building your new branch on a Slackware-14.2 chroot (X86) and it built fine, my chroot is not set up to try running it though.

Gillou68310 commented 7 years ago

Thanks @orbea, I tested on windows and it was running just fine, hopefully it should run fine on linux too ;-)

orbea commented 7 years ago

Since the fix has been merged into the master and I have tested that it is still fixed I am going to close this, thanks!

For reference. https://github.com/mupen64plus/mupen64plus-core/commit/dd05baf40c0043d2b631c2ed2a43e1d62ff2f51e