mamedev / mame

MAME
https://www.mamedev.org/
Other
8.21k stars 2.02k forks source link

double free or corruption in arsystems.cpp #9936

Open sgpowers opened 2 years ago

sgpowers commented 2 years ago

Self compiled using g++ (Ubuntu 11.1.0-1ubuntu1~20.04) 11.1.0 - This occurs in 0.242 through to HEAD. I haven't tested on previous.

Only seeing this occur on Arcadia System (arsystems.cpp) games. All games using this driver I have tested thus far seem to exhibit this behaviour.

Games seem to play fine, but on quit I get:

mame_242.bin -debug ar_argh
Debug Build: Disabling input grab for -debug
Debug Build: Disabling input grab for -debug
Debug Build: Disabling input grab for -debug
Average speed: 100.01% (6 seconds)
double free or corruption (out)
Aborted (core dumped)

Grabbing the core and running through gdb I see:

gdb /usr/local/bin/mame_242.bin core 
GNU gdb (Ubuntu 10.2-0ubuntu1~20.04~1) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/mame_242.bin...

warning: Can't open file /memfd:xorg (deleted) during file-backed mapping note processing

warning: Can't open file /memfd:/.nvidia_drv.XXXXXX (deleted) during file-backed mapping note processing
[New LWP 326009]
[New LWP 326016]
[New LWP 326018]
[New LWP 326019]
[New LWP 326017]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `mame_242.bin -debug ar_argh'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f02a38b3880 (LWP 326009))]
(gdb) bt full
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {0, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 18446744073709551615, 0, 0, 18446744073709551615, 18446744073709551615, 0, 0, 3144770185, 0, 3212836864, 0}}
        pid = <optimised out>
        tid = <optimised out>
#1  0x00007f02a6861859 in __GI_abort () at abort.c:79
        save_stage = 1
        act = 
          {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0, 139647975980656, 102800, 1, 139649360944779, 94098074141328, 16302926725057933312, 0, 0, 0, 16302926725057933312, 0, 16302926725057933312, 94098073643488, 16302926725057933312, 94098073643488}}, sa_flags = -699600896, sa_restorer = 0x0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f02a68cc26e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f02a69f6298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
        ap = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7ffd9d4b3580, reg_save_area = 0x7ffd9d4b3510}}
        fd = <optimised out>
        list = <optimised out>
        nlist = <optimised out>
        cp = <optimised out>
#3  0x00007f02a68d42fc in malloc_printerr (str=str@entry=0x7f02a69f8670 "double free or corruption (out)") at malloc.c:5347
#4  0x00007f02a68d5fa0 in _int_free (av=0x7f02a6a2bb80 <main_arena>, p=0x5594eccc59b0, have_lock=<optimised out>) at malloc.c:4314
        size = 18374686483949813760
        fb = <optimised out>
        nextchunk = 0xff005595ebcc59b0
        nextsize = <optimised out>
        nextinuse = <optimised out>
        prevsize = <optimised out>
        bck = <optimised out>
        fwd = <optimised out>
        __PRETTY_FUNCTION__ = "_int_free"
#5  0x00005594e29a43f6 in bitmap_t::~bitmap_t() ()
#6  0x00005594e1fbbd39 in screen_device::~screen_device() ()
#7  0x00005594e1fbc26d in screen_device::~screen_device() ()
#8  0x00005594e0ee307a in device_t::~device_t() ()
#9  0x00005594db90b29d in arcadia_amiga_state::~arcadia_amiga_state() ()
#10 0x00005594ddce1037 in mame_machine_manager::execute() ()
#11 0x00005594ddd98e50 in cli_frontend::start_execution(mame_machine_manager*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
#12 0x00005594ddd99105 in cli_frontend::execute(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ()
#13 0x00005594ddcdc67a in emulator_info::start_frontend(emu_options&, osd_interface&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) ()
#14 0x00005594d8c4a584 in main ()
(gdb) 

Any clues here ?

cuavas commented 2 years ago

An apparent double free in a bitmap destructor almost invariably turns out to be caused by an out-of-bounds access far earlier. Can you run it under a memory analyser (e.g. valgrind) and see if it finds anything suspicious?

sgpowers commented 2 years ago

Are there any particular compile flags for mame that will help valgrind here?

I ran it with these args:

valgrind --leak-check=full --show-leak-kinds=all --track-origins=yes --log-file=valgrind.rpt

And of course I don't see the issue on exit, however here is the report file:

valgrind_rpt.zip

Edit: Just to add to this report, it appears that ar_bios is the issue. Running mame_242.bin ar_bios and quitting reproduces this issue.

cuavas commented 2 years ago

You don't need to enable leak checking to catch an out-of-bounds access. Running valgrind with no extra options will still show that.

sgpowers commented 2 years ago

Ok, thanks for the info.

I can't even get a HEAP SUMMARY from valgrind when running mame ar_bios.

Acquired the BIOS from multiple sources, checksums checkout, so don't think it can be this..

Latest valgrind report attached from valgrind --log-file=valgrind2.rpt mame_242.bin ar_bios.

valgrind2_rpt.zip

This seems to be at the point of the issue.

==354137== Invalid write of size 8
==354137==    at 0x7B3C6A5: amiga_state::render_scanline(bitmap_rgb32&, int) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x7B3D1AF: amiga_state::screen_update_amiga(screen_device&, bitmap_rgb32&, rectangle const&) (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE1D78E2: screen_device::update_partial(int) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x7B3B02C: amiga_state::scanline_callback(int) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x7B3B844: amiga_state::device_timer(emu_timer&, unsigned int, int) (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE1CE219: device_scheduler::timeslice() (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE1632F7: running_machine::run(bool) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9EF80F8: mame_machine_manager::execute() (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9FAFE4F: cli_frontend::start_execution(mame_machine_manager*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > con
st&) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9FB0104: cli_frontend::execute(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9EF3679: emulator_info::start_frontend(emu_options&, osd_interface&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > 
> >&) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x4E61583: main (in /usr/local/bin/mame_242.bin)
==354137==  Address 0x5aa6f918 is 0 bytes after a block of size 1,911,000 alloc'd
==354137==    at 0x15CBA583: operator new[](unsigned long) (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==354137==    by 0xEBBB93D: bitmap_t::allocate(int, int, int, int) (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE1D8E04: screen_device::register_screen_bitmap(bitmap_t&) (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE1DB0BB: screen_device::device_start() (in /usr/local/bin/mame_242.bin)
==354137==    by 0xD0FC6C7: device_t::start() (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE15C754: running_machine::start_all_devices() (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE161908: running_machine::start() (in /usr/local/bin/mame_242.bin)
==354137==    by 0xE163221: running_machine::run(bool) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9EF80F8: mame_machine_manager::execute() (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9FAFE4F: cli_frontend::start_execution(mame_machine_manager*, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > con
st&) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9FB0104: cli_frontend::execute(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /usr/local/bin/mame_242.bin)
==354137==    by 0x9EF3679: emulator_info::start_frontend(emu_options&, osd_interface&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&) (in /usr/local/bin/mame_242.bin)
==354137== 

valgrind: m_mallocfree.c:305 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 1911072, hi = 18374686483949813760.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.
cuavas commented 2 years ago

@angelosa you were the most recent person to work on the Amiga stuff. Can you please have a look at this? It’s writing off the end of the screen bitmap causing memory corruption,

cuavas commented 2 years ago

@sgpowers thanks for helping narrow this down.

sgpowers commented 2 years ago

Did a little more digging on this, and it appears that there is a FIXME in amiga_state::render_scanline which is causing the issue.

// FIXME: without the add this increment will skip bitplane ops
// ddf_stop_pixel_max = 0xd8 * 2 = 432 + 17 + 15 + 1(*) = 465 > width / 2 (455)
// (*) because there's a comparison with <= in the bitplane code.
// There are various root causes about why this happens:
// - no separation of video and logic models;
// - the offsets we are applying to DDFSTRT and DDFSTOP, they mustn't be right (copper timings?);
// - ditto for DIW related values, they are offset in far too many places;

This diff stops the corruption, I cannot see issues in the game emulation (yet) with this, but will obviously have further issues in the bitplane code:

diff --git a/src/mame/video/amiga.cpp b/src/mame/video/amiga.cpp
index ca8dda53..f07af12b 100644
--- a/src/mame/video/amiga.cpp
+++ b/src/mame/video/amiga.cpp
@@ -545,7 +545,7 @@ void amiga_state::render_scanline(bitmap_rgb32 &bitmap, int scanline)
        // - no separation of video and logic models;
        // - the offsets we are applying to DDFSTRT and DDFSTOP, they mustn't be right (copper timings?);
        // - ditto for DIW related values, they are offset in far too many places;
-       for (int x = 0; x < (amiga_state::SCREEN_WIDTH / 2) + 10; x++)
+       for (int x = 0; x < (amiga_state::SCREEN_WIDTH / 2); x++)
        {
                int sprpix;
angelosa commented 2 years ago

The +10 is for avoiding video pitch corruption in far too many places in the Amiga ecosystem (i.e. skewed and unusable gfxs). The actual fix requires separation logic between Agnus (for bitplane video fetch) and Denise (for actual drawing), we currently do both in screen update and that's bad particularly the "reading logic inside video function" pattern. Additionally +10 most likely hides that we are missing accurate timings here and there, and this is exacerbated in AGA emulation (where this hack isn't enough since it has fetching modes and more fine grained bitplane control).

ETA: relevant links: https://forums.bannister.org/ubbthreads.php?ubb=showflat&Number=120521#Post120521 https://github.com/mamedev/mame/blob/690dba1c64212882ba23e0f019dbaba637987cbd/src/mame/amiga/amiga_v.cpp#L541 https://github.com/mamedev/mame/blob/690dba1c64212882ba23e0f019dbaba637987cbd/src/mame/amiga/amigaaga.cpp#L535

firewave commented 1 year ago

See also: https://mametesters.org/view.php?id=8483

angelosa commented 1 year ago

As per my previous comment doesn't change anything that isn't already known, at most somebody can hack htotal so that it pleases the underlying bitmap buffer, bad timing for bad timing that is in the plans to rewrite anyway.