Closed smcameron closed 5 years ago
I believe I have duplicated this once on my laptop with nothing other than localhost in the way of a network. I suspect it may have to do with a client coming up without the ROLE_MAINSCREEN, but with the displaymode == DISPLAYMODE_MAINSCREEN, but this is only a suspicion at this point. I'm not yet able to reliably reproduce it (have only reproduced it once, and I'm not quite sure how, but it involved bringing up two clients one with ROLE_MAINSCREEN and one without.)
Reproduced it again, and was able to attach with gdb.
(gdb) info threads
Id Target Id Frame
6 Thread 0x7f73456fe700 (LWP 4630) "threaded-ml" 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
5 Thread 0x7f73494d0700 (LWP 4631) "snis_client" 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
4 Thread 0x7f7346132700 (LWP 4632) "snis-nat-lang" 0x00007f7353ea618d in open64 () at ../sysdeps/unix/syscall-template.S:81
3 Thread 0x7f7344efd700 (LWP 4633) "snisc-tts" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
2 Thread 0x7f7313fff700 (LWP 4650) "snisc-writer" pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 1 Thread 0x7f73568a8a40 (LWP 4623) "snis_client" 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
(gdb) bt
#0 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f73558d3fe4 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2 0x00007f73558d430a in g_main_loop_run ()
from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3 0x00007f73561c2447 in gtk_main ()
from /usr/lib/x86_64-linux-gnu/libgtk-x11-2.0.so.0
#4 0x0000000000412341 in main ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f7313fff700 (LWP 4650))]
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000000000044f929 in gameserver_writer ()
#2 0x00007f7354390184 in start_thread (arg=0x7f7313fff700)
at pthread_create.c:312
#3 0x00007f7353eb503d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) thread 3
[Switching to thread 3 (Thread 0x7f7344efd700 (LWP 4633))]
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185 in ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
(gdb) bt
#0 pthread_cond_wait@@GLIBC_2.3.2 ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x0000000000453011 in text_to_speech_thread_fn ()
#2 0x00007f7354390184 in start_thread (arg=0x7f7344efd700)
at pthread_create.c:312
#3 0x00007f7353eb503d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) thread 4
[Switching to thread 4 (Thread 0x7f7346132700 (LWP 4632))]
#0 0x00007f7353ea618d in open64 () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f7353ea618d in open64 () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f7353e30ed8 in _IO_file_open (is32not64=8, read_write=8, prot=438,
posix_mode=<optimized out>, filename=0x7f7346131aa0 "\260uL",
fp=0x7f732c0008c0) at fileops.c:228
#2 _IO_new_file_fopen (fp=fp@entry=0x7f732c0008c0,
filename=filename@entry=0x4c75b0 "/tmp/snis-natural-language-fifo",
mode=<optimized out>, mode@entry=0x491d9f "r", is32not64=is32not64@entry=1)
at fileops.c:333
#3 0x00007f7353e253d4 in __fopen_internal (
filename=0x4c75b0 "/tmp/snis-natural-language-fifo", mode=0x491d9f "r",
is32=1) at iofopen.c:90
#4 0x000000000045b9f7 in monitor_natural_language_fifo ()
#5 0x00007f7354390184 in start_thread (arg=0x7f7346132700)
at pthread_create.c:312
#6 0x00007f7353eb503d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) thread 5
[Switching to thread 5 (Thread 0x7f73494d0700 (LWP 4631))]
#0 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f73545b89b8 in ?? ()
from /usr/lib/x86_64-linux-gnu/libportaudio.so.2
#2 0x00007f73545b95e0 in ?? ()
from /usr/lib/x86_64-linux-gnu/libportaudio.so.2
#3 0x00007f7354390184 in start_thread (arg=0x7f73494d0700)
at pthread_create.c:312
#4 0x00007f7353eb503d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) thread 6
[Switching to thread 6 (Thread 0x7f73456fe700 (LWP 4630))]
#0 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb) bt
#0 0x00007f7353ea7c9d in poll () at ../sysdeps/unix/syscall-template.S:81
#1 0x00007f734b5a1031 in ?? () from /usr/lib/x86_64-linux-gnu/libpulse.so.0
#2 0x00007f734b59283c in pa_mainloop_poll ()
from /usr/lib/x86_64-linux-gnu/libpulse.so.0
#3 0x00007f734b592ece in pa_mainloop_iterate ()
from /usr/lib/x86_64-linux-gnu/libpulse.so.0
#4 0x00007f734b592f80 in pa_mainloop_run ()
from /usr/lib/x86_64-linux-gnu/libpulse.so.0
#5 0x00007f734b5a0fe3 in ?? () from /usr/lib/x86_64-linux-gnu/libpulse.so.0
#6 0x00007f734b144f08 in ?? ()
from /usr/lib/x86_64-linux-gnu/pulseaudio/libpulsecommon-4.0.so
#7 0x00007f7354390184 in start_thread (arg=0x7f73456fe700)
at pthread_create.c:312
#8 0x00007f7353eb503d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb)
It does NOT appear to be a case of displaymode == DISPLAYMODE_MAINSCREEN but role not containing ROLE_MAIN.
(gdb) print role
$1 = 1534
(gdb) print displaymode
$2 = 1
#define DISPLAYMODE_MAINSCREEN 0
#define ROLE_MAIN (1 << DISPLAYMODE_MAINSCREEN)
#define ROLE_MAIN (1 << DISPLAYMODE_MAINSCREEN)
That is to say, role is 1534 == 0x5FE, which is an even number, the low bit is not set, which means role does not contain ROLE_MAIN, which is 0x01, and displaymode == 1, which is DISPLAYMODE_NAVIGATION.
ROLE_NAVIGATION (1 << DISPLAYMODE_NAVIGATION)
and 0x5FE contains 0x010 so role and displaymode are consistent with one another.
Ended up with 3 defunct snis_client processes.
sirius space-nerds-in-space # ps aux | grep snis_client
scameron 4320 1.5 0.0 0 0 pts/0 Z+ 11:37 0:21 [snis_client] <defunct>
scameron 4538 1.1 0.0 0 0 pts/0 Z+ 11:39 0:14 [snis_client] <defunct>
scameron 4623 1.0 0.0 0 0 pts/0 Z+ 11:40 0:13 [snis_client] <defunct>
root 5191 0.0 0.0 11760 2140 pts/2 S+ 12:00 0:00 grep --colour=auto snis_client
Very mysterious.
Edit: Duh, the defunct clients are due to snis_launcher still running. When I quit that, the defunct snis_clients disappeared. snis_launcher should probably have some mechanism to wait for subprocesses, or else detach them.
Here is a big clue.
When th game is experiencing the "black screen" problem, it is calling advance_game() at 30Hz as it is supposed to (I put a printf in there to verify.)
When you quit from the black screen (by typing ESC then LEFT ARROW then ENTER), only then does it stop calling advance game and then afterwards it initializes GLEW. WTF? In the output below, "Initialized GLEW" is after I selected quit. So something is getting stuck and running out of the intended sequence.
Advance game, timer = 8823
Advance game, timer = 8824
Advance game, timer = 8825
Advance game, timer = 8826
Advance game, timer = 8827
Advance game, timer = 8828
Advance game, timer = 8829
Advance game, timer = 8830
Advance game, timer = 8831
Advance game, timer = 8832
Advance game, timer = 8833
Advance game, timer = 8834
Initialized GLEW
OpenGL 3.0 available
sRGB texture supported
reading 'asteroid.stl'
zzz radius 0 = 35.870609
reading 'asteroid2.stl'
zzz radius 1 = 43.860279
reading 'asteroid3.stl'
zzz radius 2 = 51.153412
reading 'asteroid4.stl'
zzz radius 3 = 50.523045
reading starbase model 1 of 6 'starbase/starbase.obj'
parsing material library: mtllib starbase.mtl
Opening ./share/snis/models/starbase//starbase.mtl
ignoring unknown data '4':o Starbase
ignoring usemtl: usemtl Material.001
ignoring unknown data '2882':s 1
ignoring usemtl: usemtl Material.001_NONE
reading starbase model 2 of 6 'starbase2/starbase2.obj'
parsing material library: mtllib starbase2.mtl
Opening ./share/snis/models/starbase2//starbase2.mtl
ignoring unknown data '4':o Starbase2
ignoring usemtl: usemtl Material.001
ignoring unknown data '890':s 1
reading starbase model 3 of 6 'starbase3.stl'
reading starbase model 4 of 6 'starbase4.stl'
reading starbase model 5 of 6 'starbase5.stl'
reading starbase model 6 of 6 'starbase6.stl'
parsing material library: mtllib disruptor.mtl
Opening ./share/snis/models/disruptor//disruptor.mtl
ignoring unknown data '4':o Disruptor
ignoring usemtl: usemtl Material
ignoring unknown data '1788':s 1
parsing material library: mtllib research-vessel.mtl
Opening ./share/snis/models/research-vessel//research-vessel.mtl
ignoring unknown data '4':o Research-Vessel
ignoring usemtl: usemtl Material
ignoring unknown data '7839':s 1
parsing material library: mtllib conqueror.mtl
Opening ./share/snis/models/conqueror//conqueror.mtl
ignoring unknown data '4':o Conqueror
ignoring usemtl: usemtl Material.001
ignoring unknown data '1335':s 1
parsing material library: mtllib snis3006.mtl
Opening ./share/snis/models/wombat//snis3006.mtl
ignoring unknown data '4':o Wombat.001_Wombat.000
ignoring usemtl: usemtl Paint.001
ignoring unknown data '9359':s 1
parsing material library: mtllib dreadknight.mtl
Opening ./share/snis/models/dreadknight//dreadknight.mtl
ignoring unknown data '4':o Dreadknight
ignoring usemtl: usemtl Material.001
ignoring unknown data '1495':s 1
parsing material library: mtllib enforcer.mtl
Opening ./share/snis/models/enforcer//enforcer.mtl
ignoring unknown data '4':o Enforcer
ignoring usemtl: usemtl Material
ignoring unknown data '18467':s 1
triangle 828, vertices 0 and 1 are welded together 0x712a598.
triangle 1134, vertices 0 and 2 are welded together 0x712a598.
parsing material library: mtllib cargocontainer.mtl
Opening ./share/snis/models/cargocontainer//cargocontainer.mtl
ignoring unknown data '4':o Cargocontainer
ignoring usemtl: usemtl Material.001
ignoring unknown data '456':s 1
snis_client: connect to lobby pressed
lobbyname = 'localhost'
shipname = 'REVENGE'
password = 'dpr'
snis_client: connecting to lobby...
Trying to connect to lobby.
snis_client: connecting to lobby
snis_client: Connected to lobby
lobby connect to server button pressed
Notifying server, opcode update player
Wrote update player opcode
starting gameserver reader thread
snis_client: gameserver reader thread
started gameserver reader thread
starting gameserver writer thread
started gameserver writer thread
update_universe_timestamp i=0, update_time=1529170894.332243, timestamp=1201, time_delta=0.039809 offset=1529170774.192434
SET MY SHIP ID to 681
Saving default shipname as 'REVENGE'
snis_client: Protocol error in gameserver reader, opcode = 0
snis_client: : Last buffer length read from socket 15 was 1
snis_client: : 00
snis_client: last opcode was 127, before that 120
snis_client: total successful opcodes = 117
lobby socket = 14, done with lobby
8798 frames / 301 seconds, 29.2292 frames/sec
server netstats: 0 bytes sent, 0 bytes recd, secs = 1, bw = 0 bytes/sec
Hmm, no longer sure about the above "big clue". I tried again, but this time looked into /proc/pid/fd/2 which pointed to snis_client_log.txt, and got this on quitting:
Advance game, timer = 3864
Advance game, timer = 3865
Advance game, timer = 3866
Advance game, timer = 3867
Advance game, timer = 3868
Advance game, timer = 3869
Advance game, timer = 3870
snis_client: Protocol error in gameserver reader, opcode = 0
snis_client: : Last buffer length read from socket 15 was 1
snis_client: : 00
snis_client: last opcode was 127, before that 120
snis_client: total successful opcodes = 230
lobby socket = 14, done with lobby
3832 frames / 137 seconds, 27.9708 frames/sec
server netstats: 0 bytes sent, 0 bytes recd, secs = 1, bw = 0 bytes/sec
Also, init_gl() must have gone successfully because we got through the lobby screen, and that uses graphics.
Verified I can change the displaymode with F-keys on a "black screen" process by printing it out in advance_game(). Also verified main_da_expose() is getting called at 30Hz.
Advance game, timer = 7421, displaymode = 4, role & 1 << displaymode = 10
main_da_expose
Advance game, timer = 7422, displaymode = 4, role & 1 << displaymode = 10
main_da_expose
Advance game, timer = 7423, displaymode = 4, role & 1 << displaymode = 10
main_da_expose
Advance game, timer = 7424, displaymode = 4, role & 1 << displaymode = 10
main_da_expose
Advance game, timer = 7425, displaymode = 4, role & 1 << displaymode = 10
main_da_expose
Advance game, timer = 7426, displaymode = 4, role & 1 << displaymode = 10
snis_client: Protocol error in gameserver reader, opcode = 0
snis_client: : Last buffer length read from socket 15 was 1
snis_client: : 00
snis_client: last opcode was 127, before that 120
snis_client: total successful opcodes = 234
lobby socket = 13, done with lobby
7411 frames / 257 seconds, 28.8366 frames/sec
server netstats: 0 bytes sent, 0 bytes recd, secs = 1, bw = 0 bytes/sec
While debugging, I added a delay on the CONNECTED screen, which is normally never seen, so that it is seen for 10 frames. When I reproduced the "black screen" problem, I saw the CONNECTED screen, and then very briefly (1 frame or less) I saw the Nav screen, then the black screen. I have noticed that it seems that I have to run quite a few clients on the laptop to reproduce. The first one never has the problem, and it usually takes 4 to 6 clients before the problem appears.
CONNECTED screen debug hack was this:
@@ -7315,7 +7320,7 @@ static void *connect_to_gameserver_thread(__attribute__((unused)) void *arg)
displaymode = DISPLAYMODE_CONNECTED;
done_with_lobby = 1;
- displaymode = role_to_displaymode(role);
+ // displaymode = role_to_displaymode(role);
/* Should probably submit this through the packed buffer queue...
* but, this works.
@@ -7395,9 +7400,17 @@ static void show_connecting_screen(GtkWidget *w)
static void show_connected_screen(GtkWidget *w)
{
+ static int countdown = 10;
sng_set_foreground(UI_COLOR(lobby_connecting));
sng_abs_xy_draw_string("CONNECTED TO SERVER", SMALL_FONT, txx(100), txy(300) + LINEHEIGHT);
sng_abs_xy_draw_string("DOWNLOADING GAME DATA", SMALL_FONT, txx(100), txy(300) + LINEHEIGHT * 3);
+ if (done_with_lobby) {
+ countdown--;
+ if (countdown == 0)
+ displaymode = role_to_displaymode(role);
+ } else {
+ countdown = 10;
+ }
}
Aha! It's this, in main_da_expose() in snis_client.c ...
17567 if (displaymode < DISPLAYMODE_FONTTEST) {
17568 if (!(o = find_my_ship())) {
17569 fprintf(stderr, "find_my_ship() returned NULL!\n");
17570 goto end_of_drawing;
17571
The main drawing function is unable to find the pointer to the player's ship for some reason.
Mitigated slightly by de348a2599e8ba9b2dcacb9e5052993b98d27322
The server is sending the correct ship id and the client is receiving it via the OPCODE_ID_CLIENT_SHIP opcode. However, the server does not send all the objects over because:
snis_server(sirius):: calling add new player
snis_server(sirius):: reading update player packet
snis_server(sirius):: read update player packet, rc = 0
snis_server(sirius):: new client: sn='REVENGE', pw='dpr', create = 0
snis_server(sirius):: c->bridge = 0
snis_server(sirius):: join existing ship
snis_server(sirius):: queue client id 717
server: -----> sending client id 717
snis_server: client refcount = 2
client reader thread exiting, c->socket = -1
snis_server: client refcount = 1
client writer thread exiting, c->socket = -1
snis_server: client refcount = 0
snis_server: client count of bridge 0 = 6
snis_server: calling remove_client 6
snis_server: remove_client 6 returned
The client reader and client writer threads in the server exit for that client because c->socket is -1. Not sure why it's -1 though, considering it does successfully send the ship id to the client.
Ok.... connection reset by peer. But why?
server: -----> sending client id 633
snis_server: client refcount = 2
snis_server(sirius):: read from multiverse, rc = 0, sock = 7
snis_server(sirius):: reading from multiverse sock = 7...
snis_readsocket: Connection reset by peer
client reader thread exiting, c->socket = -1
snis_server: client refcount = 1
client writer thread exiting, c->socket = -1
snis_server: client refcount = 0
snis_server: client count of bridge 0 = 2
snis_server: calling remove_client 2
snis_server: remove_client 2 returned
Confirmed with gdb that the snis_client's gameserver_sock is indeed -1, and the snisc-reader thread is nowhere to be found (snisc-writer is there though.)
But I am not seeing anywhere that the gameserver reader thread exited. Maybe it never got created? Pthread create failed perhaps? No, we should catch that.
7348 printf("starting gameserver reader thread\n");
7349 rc = create_thread(&read_from_gameserver_thread, gameserver_reader, NULL, "snisc-reader", 1);
7350 if (rc) {
7351 fprintf(stderr, "Failed to create gameserver reader thread: %d '%s', '%s'\n",
7352 rc, strerror(rc), strerror(errno));
7353 }
7354 printf("started gameserver reader thread\n");
Ok... seems to be a protocol error (which I should have noticed in comment 6, above)
SET MY SHIP ID to 667
Saving default shipname as 'REVENGE'
SET SOLARSYSTEM TO 'sirius'
Reading solarsystem specifications...done
snis_client: Protocol error in gameserver reader, opcode = 0
snis_client: last opcode was 127, before that 120
snis_client: total successful opcodes = 230
snis_writesocket: Bad file descriptor
snis_text_to_speech.sh crew member has joined.
Re-loading per solarsystem textures
snis_client: Protocol error in gameserver reader, opcode = 0
snis_client: : Last buffer length read from socket 15 was 1
snis_client: : 00
snis_client: last opcode was 127, before that 120
snis_client: total successful opcodes = 230
So there's a stray 0 byte coming from somewhere? But only sometimes? wtf?
I think I may have found a big part of the problem.
Consider the following change. Note that without this change, send_packet_to_all_clients_on_a_bridge() is getting called from within functions that are already per client functions. This means these function produce traffic that unintentionally scales as the square of the number of clients, which is not good at all.
@@ -13136,9 +13136,7 @@ static void pack_and_send_ship_sdata_packet(struct game_client *c, struct snis_e
p.lifeform_count = o->tsd.starbase.lifeform_count;
else
p.lifeform_count = 0;
- pthread_mutex_unlock(&universe_mutex);
send_ship_sdata_packet(c, &p);
- pthread_mutex_lock(&universe_mutex);
}
static void send_update_ship_cargo_info(struct game_client *c, struct snis_entity *o)
@@ -13179,9 +13177,7 @@ static void send_update_ship_cargo_info(struct game_client *c, struct snis_entit
qty = cbc->qty;
packed_buffer_append(pb, "bwS", (uint8_t) i, item, qty, (int32_t) 1000000);
}
- pthread_mutex_unlock(&universe_mutex);
- send_packet_to_all_clients_on_a_bridge(c->shipid, pb, ROLE_SCIENCE);
- pthread_mutex_lock(&universe_mutex);
+ pb_queue_to_client(c, pb);
}
static int save_sdata_bandwidth(void)
@@ -19675,7 +19671,7 @@ static void send_ship_sdata_packet(struct game_client *c, struct ship_sdata_pack
sip->shield_strength, sip->shield_wavelength, sip->shield_width, sip->shield_depth,
sip->faction, sip->lifeform_count,
sip->name, (unsigned short) sizeof(sip->name));
- send_packet_to_all_clients_on_a_bridge(c->shipid, pb, ROLE_ALL);
+ pb_queue_to_client(c, pb);
}
static void send_generic_ship_damage_packet(struct snis_entity *o, uint8_t opcode)
With this patch, I can run 10 clients on my laptop without bumping into the connection reset problem (which I've not previously been able to do.) Need to verify this doesn't break anything, and I don't know if it fixes the connection reset problem or merely avoids it.
Here's a screen shot of 10 clients running just for fun.
Committed 2d595e2a4dae5ecb442e8370ec7f67c32427111d
Was able to run 16 clients plus server on a single laptop and they all behaved responsively and didn't run into the peer reset connection issue.
I do not think I have found the root cause on this, I think I've just made it much less likely to occur. I can't explain the protocol error. So I am going to leave this bug open, but I suspect it will very rarely be seen, except with relatively large numbers of clients.
The symptoms of this bug are now different though. Now, instead of a black screen, you should see: "ERROR: CANNOT FIND POINTER TO PLAYER SHIP, SORRY!" on the client. The client will have terminated the connection due to having received an invalid packet from the server (a single zero byte, in cases I've seen.) I don't know where this malformed packet is coming from within the server.
Re-opened. Didn't mean to close it.
It occurs to me that without change 2d595e2a4dae5ecb442e8370ec7f67c32427111d there was this very questionable mutex manipulation.
pthread_mutex_unlock(&universe_mutex);
send_packet_to_all_clients_on_a_bridge(c->shipid, pb, ROLE_SCIENCE);
pthread_mutex_lock(&universe_mutex);
unlocking, then locking the universe_mutex here is a mistake. I think I must have gotten confused and thought that because send_packet_to_all_clients_on_a_bridge() takes the client lock, that I needed to release the universe lock to avoid a deadlock. However, in cases where the code takes both locks, the universe_mutex is taken first, then the client lock. So I shouldn't have needed to release the universe lock there, and doing so may have allowed a race condition which may be the source of that stray zero. I don't have a concrete theory about how that specific outcome could occur, but that unlock definitely looks incorrect.
Haven't seen this issue occur in almost a year now. I'm going to consider it fixed.
When running with a real network with around 5 clients on a wired LAN, occasionally after successfully connecting to the lobby and then attempting to connect to snis_server, the client would just show a black screen. The keys would still operate, (e.g. you could switch between screens with the function keys, and quit with escape, arrow left and enter) but you just couldn't see anything. I am not sure if anything requiring a network connection would work (e.g. switching to nav and yawing the ship, for example. I should try that next time I encounter the problem.) Restarting the client (quitting, then restarting via snis_launcher script) would sometimes fix the problem, although sometimes it would take several tries. I saw this on my laptop, and others saw it on theirs as well.
Don't really have any good explanation for it.