Closed rakslice closed 4 years ago
Note that kanjitalk755 works okay on linux for amd64 generally, such as the random debian 9.5 amd64 vm I have around, even with only a single processor.
sdl2 in 20.04 is 2.0.10+dfsg1-3
Same issue with sdl2 2.0.10 built from source.
Note that cebix/macemu has a sort of similar hang before starting in 20.04 if audio is enabled in the config, despite its message about disabling audio because the devices weren't found in that case
Some old commits that appear in the commit log of kanjitalk755's branch don't have this problem, but maybe they just appear there by virtue of being merge sources that don't really represent the state of the history of the branch
Perhaps it would be worth getting to grips with some of the transient build problems in the history of the branch to usefully bisect it, for instance the undefined SIGSEGV_FAULT_HANDLER_ARGLIST
error of e.g. f96c92ad5178b29d096d87bee0826196c0f7df3d
164935017b97eb2f1d39e566bd05ee428bf9099c -> ok f170a527b21de23178666ffb6be693d4996e8272 -> failing to build
Okay, so part of the build failure is that some autoconf checks that should pass are failing because they just test build sigsegv.cpp
but it doesn't build for reasons unrelated to the check, e.g. because of some mach specific code not in a mach-specific define case, as well as missing the include of stdint.h
. There are still more problems affecting f170a52, but for f96c92ad5178b29d096d87bee0826196c0f7df3d it is enough to gets us through the build and it doesn't have this issue!
f96c92ad5178b29d096d87bee0826196c0f7df3d -> ok
This kind of autoconf checks used in the project, where it just tries different flags on big huge source files, are kind of brittle for development for this reason... if you're absolutely sure the code works, they are an okay way to save time I guess, but considering that code can fail to compile for a variety of reasons while doing development, they are a very poor substitute compared to just having narrow feature check code specifically intended for such checks
52fe2290fe5816ad67aa2796331773c2c5208ee3 is the first bad commit
This is just the first bad build with default config options by virtue of it being the first one where SDL2 is enabled. If I explicitly configure 52fe2290fe5816ad67aa2796331773c2c5208ee3 with --enable-sdl-video=no --enable-sdl-audio=no
[and work around the flag bug in main_unix.cpp
], it works fine
After a fair amount of shotgun debugging my way through various parts of the SDL driver, I found that changing the empty render driver hint in video_sdl2.cpp
to request software rendering, as was already done for the WIN32
case, made it start working immediately on my test system:
SDL_SetHint(SDL_HINT_RENDER_DRIVER, "software");
I should probably follow up on this as it will likely come up with 20.04 hitting wider adoption soon
Note that just commenting out the empty hint entirely so that there is no hint doesn't make the problem go away.
In the failing config with opengl
renderer, redraw_func
is being called at reasonable ~16ms intervals, but the VideoVBL
calls aren't happening. Does this mean the emulated Mac doesn't think video is in an active state? Is this because some value it has received from initialization to that point is not good, or else something else has gone wrong that is holding up the emulated Mac (what could it be)?
I had the same thing happen on Arch x64 2020.06.01.
Notably both of these distros seem to have the same kind of 'modern' permissions changes where e.g. you need to run gdb as root
This issue was fixed by 1da8385. And had existed since 2005 (dd2b9a9).
@kanjitalk755 Curious. Did dd2b9a95d5c51866c5fb9571c9dfd3ec8e11a246 introduce it? If so, how? Something to do with the alignment change of regs
?
Oh, I see, by virtue of spcflags
being inside powerpc_registers
and the instance of it actually located at regs_ptr()
never getting initialized there
@kanjitalk755 It just occurs to me that initializing that regs_ptr()
seems like a job for placement new
. I wonder if there are more of these things in the codebase that are effectively c-style casts of a confusingly-typed buffer?
Can you explain in detail?
From the old dd2b9a95d5c51866c5fb9571c9dfd3ec8e11a246 change:
In class powerpc_cpu
:
Before:
powerpc_registers regs;
After:
struct {
powerpc_registers regs;
uint8 pad[16];
} _regs;
// Make sure the calculation of the current offset makes use of
// 'this' as this could make it simplified at compile-time
powerpc_registers *regs_ptr() const { return (powerpc_registers *)((char *)&_regs.regs + (16 - (((char *)&_regs.regs - (char *)this) % 16))); }
In the case where regs_ptr()
is shifted over from the one at regs
, the constructor for powerpc_registers
is never run on the shifted-over instance. That instance doesn't come from anywhere, it is just a C-style cast (powerpc_registers *)
applied to memory that was already part of an object. Since the powerpc_registers
constructor never runs on it, this also means that the constructors for the members of the powerpc_registers
aren't called, and those members' members, etc.
In 1da83854b0e67ac5061db878c27fc6247ad2362b you fix the basic_spcflags
' mask
and lock
members potentially not being initialized, since the basic_spcflags
constructor was potentially not called, by moving the initialization to a separate member function basic_spcflags::init()
and explicitly calling it in the powerpc_cpu
constructor.
Another way to fix that would be to just make sure the powerpc_registers
constructor actually runs on the memory location we use it at:
#include <new>
powerpc_cpu::powerpc_cpu() {
new (regs_ptr()) powerpc_registers();
...
This has the benefit that it also initializes everything else in powerpc_registers
, without having to create a solution for initializing each member, though I don't know if not being initialized is functionally a problem for anything else in there.
With regs
still in existence, regs
' constructor also runs, implicitly at the start of the powerpc_cpu
constructor, but since it runs first it shouldn't clobber anything set by the one we actually want. (See https://gist.github.com/rakslice/3dbd4fc89964a37b4da3dd5a4fc93ef1)
It would seem reasonable to ask: Since we found this problem, does the codebase have other problems of this form?
The general form of this problem is that some memory that was part of an existing object was cast as a pointer to a different class, and then used, without the exact underlying memory range being initialized/constructed as an instance of that class first by any code. There are so many casts happening everywhere in this codebase; it's not trivial to verify each.
Thank you for the explanation.
dd2b9a9 implemented the alignment without relying to compiler options before it was standardized, but failed as a result. The compiler before C++11 cannot be used, but I think the following implementation is straightforward.
https://github.com/kanjitalk755/macemu/tree/test_alignas
C-style cast (powerpc_registers *)
is still dangerous, but it's common.
It would seem reasonable to ask: Since we found this problem, does the codebase have other problems of this form?
If the project is development phase, we may need to do a thorough code review. However, in the maintenance phase it may be sufficient to handle the bugs found.
Summary:
In ubuntu 20.04 amd64 on a dual core with HT machine, kanjitalk755's fork hangs indefinitely at a black screen on launch, when using the
"opengl"
SDL 2 renderer. As a workaround use you can use the SDL renderer"software"
instead of the default"opengl"
one. As of commita01387b1af07eb4217a3ea9b6561125dbd1c060c
, the preferences file parametersdlrender
works for this purpose so you don't have to make a code change.Previous first investigation steps:
From a cursory look with gdb it is hanging in the various
spin_lock()
s inspcflags.hpp
, although this may just be what a readily identifiable thread is doing at the time.The upstream cebix/macemu code doesn't have this issue on the same machine.