Closed rakslice closed 3 years ago
At this point in slirp_select_poll
, looking at the "Check TCP sockets" for
loop we are inside, at the top of the loop the first thing it does is cache the so
1 ahead after the current one into so_next
.
/*
* Check sockets
*/
if (link_up) {
/*
* Check TCP sockets
*/
for (so = tcb.so_next; so != &tcb; so = so_next) {
so_next = so->so_next;
Since I just wrote code precisely like this in the loop that rearranges the drive queue, ;) I'm going to presume that this is a cheap trick to get around the fact that the code within the loop in some case needs to modify the list while it is iterating through it, i.e. it needs to take the current node out of the list and wants to not lose its place when it does that.
However if, in the course of iterating through the list, there is a possibility of the next node after the current one incidentally being modified in such a way that it becomes freed or otherwise so_tcpcb
-invalid, and it is taken out of the list to avoid a problem, then so_next
is actually a liability because it will cause the iteration to still visit it.
For the case where so->so_tcpcb
is 0
, it is being set to 0
during tcp_close
.
There are more than one ways into tcp_close
:
From within slirp_select_poll
cleaning up sockets:
#0 tcp_close (tp=0x81ea840) at ../slirp/tcp_subr.c:282
#1 0x00533e04 in tcp_slowtimo () at ../slirp/tcp_timer.c:94
#2 0x00535152 in slirp_select_poll (readfds=readfds@entry=0x883fc0c,
writefds=writefds@entry=0x883fd10, xfds=xfds@entry=0x883fe14)
at ../slirp/slirp.c:374
#3 0x00410284 in slirp_receive_func (arg=0x0) at ether_windows.cpp:1392
#4 0x76726cff in msvcrt!_beginthreadex () from C:\WINDOWS\System32\msvcrt.dll
#5 0x76726dc1 in msvcrt!_endthreadex () from C:\WINDOWS\System32\msvcrt.dll
#6 0x764ffa29 in KERNEL32!BaseThreadInitThunk ()
from C:\WINDOWS\System32\kernel32.dll
#7 0x772975f4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#8 0x772975c4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#9 0x00000000 in ?? ()
From input. This comes from a different thread (!)
(gdb) bt
#0 tcp_close (tp=tp@entry=0x81ccf00) at ../slirp/tcp_subr.c:294
#1 0x0052e57d in tcp_input (m=m@entry=0x81ca0a8, iphlen=<optimized out>,
iphlen@entry=20, inso=inso@entry=0x0) at ../slirp/tcp_input.c:978
#2 0x0053610b in ip_input (m=0x81ca0a8) at ../slirp/ip_input.c:222
#3 0x005353a6 in slirp_input (pkt=<optimized out>, pkt_len=72)
at ../slirp/slirp.c:637
#4 0x004108d2 in ether_thread_write_packets (arg=0x0)
at ether_windows.cpp:1021
#5 0x76726cff in msvcrt!_beginthreadex () from C:\WINDOWS\System32\msvcrt.dll
#6 0x76726dc1 in msvcrt!_endthreadex () from C:\WINDOWS\System32\msvcrt.dll
#7 0x764ffa29 in KERNEL32!BaseThreadInitThunk ()
from C:\WINDOWS\System32\kernel32.dll
#8 0x772975f4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#9 0x772975c4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#10 0x00000000 in ?? ()
When I run into the segfault in the slirp_receive_func
thread with so->so_tcpcb
equal 0
, what the ether_thread_write_packets
thread is doing at the time is that, as hinted at by tcp_close
being used from multiple threads, it is past the so->so_tcpcb = 0
line and not yet to sofree(so)
:
104 int mss = so->so_tcpcb->t_maxseg;
(gdb) bt
#0 0x0053738e in soread (so=so@entry=0xcdc5e40) at ../slirp/socket.c:104
#1 0x00534f9c in slirp_select_poll (readfds=readfds@entry=0x883fc0c,
writefds=writefds@entry=0x883fd10, xfds=xfds@entry=0x883fe14)
at ../slirp/slirp.c:412
#2 0x00410284 in slirp_receive_func (arg=0x0)
at C:/msys32/home/Andrew Tonner/src/macemu/BasiliskII/src/Windows/ether_windows.cpp:1420
#3 0x76726cff in msvcrt!_beginthreadex () from C:\WINDOWS\System32\msvcrt.dll
#4 0x76726dc1 in msvcrt!_endthreadex () from C:\WINDOWS\System32\msvcrt.dll
#5 0x764ffa29 in KERNEL32!BaseThreadInitThunk ()
from C:\WINDOWS\System32\kernel32.dll
#6 0x772975f4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#7 0x772975c4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#8 0x00000000 in ?? ()
(gdb) thread 9
[Switching to thread 9 (Thread 28060.0x6574)]
#0 0x77278b7a in ntdll!RtlGetCurrentServiceSessionId ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
(gdb) bt
#0 0x77278b7a in ntdll!RtlGetCurrentServiceSessionId ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#1 0x257ecf32 in ?? ()
#2 0x0cdc5f80 in ?? ()
#3 0x772b621d in ntdll!RtlGetNtGlobalFlags ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#4 0x0cdc5f80 in ?? ()
#5 0x77278786 in ntdll!RtlFreeHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
#6 0x7730f687 in ntdll!RtlRegisterSecureMemoryCacheCallback ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#7 0x772c67e0 in ntdll!RtlCaptureStackContext ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#8 0x772b621d in ntdll!RtlGetNtGlobalFlags ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#9 0x0cdc5f80 in ?? ()
#10 0x77278786 in ntdll!RtlFreeHeap () from C:\WINDOWS\SYSTEM32\ntdll.dll
#11 0x76707379 in msvcrt!free () from C:\WINDOWS\System32\msvcrt.dll
#12 0x02090000 in ?? ()
#13 0x005318b0 in tcp_close (tp=tp@entry=0xcdc5ec0) at ../slirp/tcp_subr.c:303
#14 0x0052e54d in tcp_input (m=m@entry=0xcdbcd00, iphlen=<optimized out>,
iphlen@entry=20, inso=inso@entry=0x0) at ../slirp/tcp_input.c:977
#15 0x00535f8b in ip_input (m=0xcdbcd00) at ../slirp/ip_input.c:222
#16 0x00535226 in slirp_input (pkt=<optimized out>, pkt_len=72)
at ../slirp/slirp.c:628
#17 0x004108d2 in ether_thread_write_packets (arg=0x0)
at C:/msys32/home/Andrew Tonner/src/macemu/BasiliskII/src/Windows/ether_windows.cpp:1037
#18 0x76726cff in msvcrt!_beginthreadex () from C:\WINDOWS\System32\msvcrt.dll
#19 0x76726dc1 in msvcrt!_endthreadex () from C:\WINDOWS\System32\msvcrt.dll
#20 0x764ffa29 in KERNEL32!BaseThreadInitThunk ()
from C:\WINDOWS\System32\kernel32.dll
#21 0x772975f4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#22 0x772975c4 in ntdll!RtlGetAppContainerNamedObjectPath ()
from C:\WINDOWS\SYSTEM32\ntdll.dll
#23 0x00000000 in ?? ()
(gdb) f 13
#13 0x005318b0 in tcp_close (tp=tp@entry=0xcdc5ec0) at ../slirp/tcp_subr.c:303
303 sbfree(&so->so_snd);
(gdb) list
298 /* clobber input socket cache if we're closing the cached connection */
299 if (so == tcp_last_so)
300 tcp_last_so = &tcb;
301 closesocket(so->s);
302 sbfree(&so->so_rcv);
303 sbfree(&so->so_snd);
304 sofree(so);
305 tcpstat.tcps_closed++;
306 return ((struct tcpcb *)0);
307 }
I imagine that there is some way to a fix this specific problem where we
tcp_close
so that the socket being closed is taken out of active processing (e.g. removed from the list) first, not left in, before it is invalidated in such a way that causes undefined behaviour in other functions trying to use itso_next
potentially becoming invalid because of a close from the other threadBut if slirp already doesn't have a kind of critical section it uses, this is a big hint that it's not designed to have its functions called concurrently on multiple threads.
Testing in Linux, weirdly all the tcp_close
calls come from tcp_slowtimo
...
That's a little strange :D
Okay, so in ether_unix
, the emulated machine is connected to slirp with pipes for some reason. The emulated machine's call to send a packet just puts it into the pipe.
slirp_input()
, which is called for the packets inbound to slirp from the emulated machine, in ether_unix
is actually done inside the slirp_receive_func
thread. It alternates between select
s for slirp's sockets and a select on the inbound packets pipe, which has its own set of possible consequences, but at least guarantees that only one slirp function call can be in progress at once.
That means that this bug doesn't apply to Unix
.
But the question of why in practice no sockets close due to tcp_input
is still unanswered.
Linux: Ah, this is just down to normal browser behaviour, I guess; a quick test of closing a session from the local side in NCSA Telnet does call slirp_input
as expected.
Running B2 in Windows, a more likely way to hit the bug than doing things in the browser is closing a telnet session in NCSA Telnet locally (i.e. from the client side)
In B2, a frequent slirp segfault is
here initializing at the top of
soread
:with
so->so_tcpcb
0
or0xfeeefeee
(MS C runtime placeholder value indicating the heap at the location has already been freed, so in this instance presumably thestruct socket
pointed to byso
has already been freed.)I don't have an exact procedure to cause this segfault 100% of the time, but doing a process of launching B2, running Netscape 2 in Mac OS 7.6.1, with its stock Open Transport 1.1.1, and visiting and browsing around a couple of sites, which includes
and then shutting down, I usually reproduce the problem in two or three boots.
[Update: A more reliable way to hit the bug is something that definitely closes a TCP connection locally on the emulated Mac, such as closing a telnet connection in NCSA Telnet]