Closed Sophira closed 1 year ago
As of 2375623 the crash should be fixed (and you now get a visible error if the podule fails to initialise). I'm guessing that while this solves part of the problem, you're now going to be getting an error on startup? This would likely be slirp failing to init for some reason, if this is the case then more digging is required.
I'm getting some interesting behaviour now. The error window only intermittently displays for me. (The points when the crashes occur are different now, though, which perhaps suggests that this is a different error and not to do with the aeh50/aeh54 podule directly - see below.)
I tried running Arculator from a fresh start 40 times (20 each for aeh50 and aeh54), and for aeh50, the error only displayed 4 times (the other 16 times it still segfaults, but see below!), and for aeh54 I got the error 7 times (with the other 13 being segfaults).
In most cases where the error appeared and after acknowledging the error and closing the machine window, attempting to start the machine again from the same session results in a segfault. (During debugging - not in the above tests - I did notice one time where I got the error again, so I imagine this might just be the same race condition.)
When they showed, the error windows themselves were interesting too. Initially they showed up too tall, like this:
However, trying to click the OK button results in the window resizing itself to the correct size, like this:
(Note that this is the same error window that just resized itself, not a new error window!)
After the resize occurs, I can click the OK button as normal.
When the error window did pop up, the errors were as follows:
(aeh50 error window)
Failed to initialise podule 1 :
Acorn Ethernet II podule (AEH50)
(aeh50 console error)
[00000000]: Failed to init podule 1 : aeh50
(aeh54 error window)
Failed to initialise podule 1 :
Acorn Ethernet III podule (AEH54)
(aeh54 console error)
[00000000]: Failed to init podule 1 : aeh54
Interestingly, when debugging, I see errors in different places now (and not just SIGSEGV - I've gotten a SIGFPE during debugging a couple of times, too, though never when running outside of the debugger):
/usr/lib64/libgdk-x11-2.0.so.0
while trying to read memory somewhere within the gdk_window_set_geometry_hints
function. (The file belongs to the Gentoo package x11-libs/gtk+-2.24.33-r2
.) I don't compile debug symbols for this library so I can't tell you what it's doing exactly but given the function name it sounds like it's related to setting a window size?/usr/lib64/libgtk-x11-2.0.so.0
while attempting to divide by zero. (This file belongs to the same Gentoo package as above, x11-libs/gtk+-2.24.33-r2
.) Unfortunately I can't tell you where as there wasn't a function name given in the backtrace, but the surrounding code appears to be using pango to set window parameters such as width and extents.As before, I only get these errors when using the aeh50/aeh54 podules, which is... kinda strange. There's also no Arculator code directly in the backtraces for these errors (other than the very beginning, where you initially call wxEntry(argc, argv);
in wx-main.cc, which obviously can't be anything to do with it). It's possible that something occurred earlier in a different thread, perhaps, which indirectly caused this?
Given the issues with the window size that I provided screenshots of, and that the debugging seems to suggest something weird with window size, I suspect that these new segfaults might be related to the new error window and not the original error with the aeh50/aeh54 podules - so technically this probably ought to be a new issue, but I'm replying here as it seems to make sense to put it here too as it seems intertwined a little.
(In support of this theory, the last log message I see before these new segfaults is the [00000000]: Failed to init podule 1 : aeh50
or Failed to init podule 1 : aeh54
line, so it seems likely this error occurs immediately after that, which would fit this theory.)
As this new issue occurs within GTK+ itself, it's probable that these new segfaults I'm getting (to do with the error window itself, not the aeh50/aeh54 podule directly) are likely Linux/X11/GTK-specific, in case you want to try to reproduce.
Argh, think this is a UI threading issue. Try 1bad6bd.
If that works then we can start debugging why slirp doesn't init. Slirp does have debugging support - defining SLIRP_DEBUG should cause a slirplog.txt
to be created.
Sorry for the delay responding!
Building using ac55c53c, the crashes have now disappeared and the error dialogs come up every time (and are the correct size).
Defining SLIRP_DEBUG
does indeed cause a slirplog.txt
file to be created. Unfortunately, it doesn't contain much:
Slirp Cockatrice special - Debugging Started.
Debugging Started level 7.
That's literally all there is in the file after a machine start, clicking OK on the error message, letting the machine get to the RISC OS desktop, and then closing the machine window. This is the case for both aeh50 and aeh54. (I tested with aeh54 first, though I don't think that matters in this case. I also erased the log file manually between attempts to ensure it was being recreated.)
In case it helps at all, here's the raw file after testing with aeh50, though I suspect it won't help much: slirplog.txt
I found an aeh50_log
function in podules/aeh50/src/aeh50.c
which was returning without doing any logging. Commenting out the return
statement gives me a aeh50_log.txt file:
[00000000] : podule_probe 0x7fbe8bebea60 path=./podules/aeh50/
[00000000] : aeh50 ROM ./podules/aeh50/EthernetII_ID_ROM.ROM
[00000000] : Failed to open network device
I'm going to do some debugging to figure out exactly what's wrong.
I think I found the issue!
The problem seems to be that libpcap support in Arculator is limited to Win32 only. As such, on my Linux machine the routine returns NULL instead of initialising libpcap.
So it looks like this is by design! I unfortunately don't know enough about libpcap to contribute the correct code myself, but I assume there's a good reason it's limited to Win32.
[edit: Wait, never mind, I'm wrong, I missed the SLIRP initialisation part. Sorry. Will do more debugging.]
Okay, I think I have an actual answer now, and it's a doozy - a rare edge case.
It looks like SLIRP reads the /etc/resolv.conf
file in order to find the nameserver in use. However, I actually run my own DNS resolver on my system, and don't specify any nameservers in my /etc/resolv.conf
file. (In fact, the file is completely empty apart from comments.) This is allowed to occur - the resolv.conf(5) man page states:
If no nameserver entries are present, the default is to use the name server on the local machine.
I can confirm that if I temporarily add a nameserver entry to the file pointing to my local nameserver, then the error goes away and the podule successfully initialises.
So the solution here would be to default to localhost if an entry isn't found, rather than returning an error!
I've pushed a possible fix to a4a3b63, could you give it a try please?
The new commit does appear to fix this issue! It turns out that there's another issue but it seems to be unrelated to this one (when using aeh50, closing the machine results in a free(): invalid pointer
error), and I was getting that error even when using the /etc/resolv.conf
workaround. I'll open a new issue for that one - thank you!
When I try to use the AEH50 or AEH54 podules in Arculator on Linux using commit 323e1e21, starting up the machine will result in a segfault.
Relevant files:
Debugging reveals that the segfault occurs at different places depending on the podule chosen:
podules/common/net/ne2000.c
podules/aeh54/src/seeq8005.c
In both cases, the cause is similar: The
net
pointer (ne2000->net
for aeh50,seeq8005->net
for aeh54) is NULL.It's quite possible there's something I haven't done that I need to do, so if this is the case, I apologise! The ROMs are in their correct locations, though, as far as I know. (
podules/aeh50/EthernetII_ID_ROM.ROM
andpodules/aeh54/ether3.rom
respectively)I'm using gcc 11.2.1 to compile on Gentoo Linux.
If there's any more information you need, please let me know!