ioquake / ioq3

The ioquake3 community effort to continue supporting/developing id's Quake III Arena
https://ioquake3.org/
GNU General Public License v2.0
2.39k stars 529 forks source link

Infinite while loop occurs in CON_FlushIn #353

Closed tdm4 closed 6 years ago

tdm4 commented 6 years ago

https://github.com/ioquake/ioq3/blob/d28e667e469c68223f6969130e23c4ea1f09cc5e/code/sys/con_tty.c#L87

When I run ioq3ded with rcctl, after some unspecified amount of time, the game stops responding, CPU shoots up to about 90%. I suspected because STDIN isn't actually being used since it's running as a daemon. If I run ioq3ded in a tmux window, I never get this issue.

Here's the stack trace:

#0  _thread_sys_read () at -:3
#1  0x00001604acc17c54 in _libc_read_cancel (fd=Variable "fd" is not available.
) at /usr/src/lib/libc/sys/w_read.c:27
#2  0x0000160288b9573f in CON_Input () at code/sys/con_tty.c:87
#3  0x0000160288b21268 in Com_GetSystemEvent () at code/qcommon/common.c:2024
#4  0x0000160288b21431 in Com_GetRealEvent () at code/qcommon/common.c:2073
#5  0x0000160288b21758 in Com_EventLoop () at code/qcommon/common.c:2151
#6  0x0000160288b231c1 in Com_Frame () at code/qcommon/common.c:3168
#7  0x0000160288b8a795 in main (argc=Variable "argc" is not available.
) at code/sys/sys_main.c:759

I've done a lot of testing. I can't reproduce it reliably enough, I just have to leave the server running and once in a while the read for STDIN_FILNO returns a -1 and the while loop becomes infinite.

Perhaps we need to replace those two lines with:

tcflush(STDIN_FILENO, TCIFLUSH);
tdm4 commented 6 years ago

Just a little more background: In OpenBSD, if you start a daemon with rcctl, it uses your current tty/terminal as STDIN (FD 0). However, once you quit that terminal or log out, running fstat -u _ioq3 then shows STDIN as 'bad'. It's then that whenever Con_FlushIn is called, the function can get stuck in an infinite loop if STDIN stops returning -1 and it seems it never finishes.

Using tcflush() is a better solution as it was designed to flush properly (and it's a POSIX function)

holgersson32644 commented 6 years ago

Hi, I have strange bug with similar symptoms (i.e. full hangs with one core at 100% after a random time from start) on Gentoo Linux which only shows if I’m enabling opus support. I’m not using this engine directly, but a modified fork for UrbanTerror. Your patch seems to fix this. Therefore thank you!

zturtleman commented 6 years ago

Fixed by https://github.com/ioquake/ioq3/pull/356.