lucasart / c-chess-cli

Command Line Interface for UCI Chess engines written in C
GNU General Public License v3.0
71 stars 9 forks source link

Crash on unresponsive engine #33

Open sderr opened 3 years ago

sderr commented 3 years ago

Hi, I got a an abort in free():

(gdb) bt
#0  0x00007f6fa421e2c7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f6fa421f9b8 in __GI_abort () at abort.c:90
#2  0x00007f6fa4260e17 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f6fa43727f8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f6fa4269609 in malloc_printerr (ar_ptr=0x7f6fa45ae760 <main_arena>, ptr=<optimized out>, str=0x7f6fa436feb1 "free(): invalid pointer", action=3)
    at malloc.c:5004
#4  _int_free (av=0x7f6fa45ae760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3843
#5  0x00007f6fa426482a in __GI__IO_free_backup_area (fp=fp@entry=0xf24d40) at genops.c:202
#6  0x00007f6fa4263f95 in _IO_new_file_overflow (f=0xf24d40, ch=-1) at fileops.c:828
#7  0x00007f6fa4262900 in _IO_new_file_xsputn (f=0xf24d40, data=<optimized out>, n=17) at fileops.c:1326
#8  0x00007f6fa4234f73 in _IO_vfprintf_internal (s=<optimized out>, format=<optimized out>, ap=ap@entry=0x7f6fa31e4728) at vfprintf.c:1635
#9  0x00007f6fa423b347 in __fprintf (stream=<optimized out>, format=format@entry=0x40d4ac "%s -> %s\n") at fprintf.c:32
#10 0x0000000000405eeb in engine_readln (w=w@entry=0xf25f00, e=e@entry=0x7f6fa31e4e70, line=line@entry=0x7f6fa31e4890) at src/engine.c:281
#11 0x0000000000406c3e in uci_engine_bestmove (info=0x7f6fa31e49d0, pv=0x7f6fa31e4bc0, best=0x7f6fa31e4c00, timeLeft=0x7f6fa31e4be0, e=0x7f6fa31e4e70, w=0xf25f00)
    at src/engine.c:322
#12 engine_bestmove (w=w@entry=0xf25f00, e=e@entry=0x7f6fa31e4e70, timeLeft=timeLeft@entry=0x7f6fa31e4be0, best=best@entry=0x7f6fa31e4c00, pv=pv@entry=0x7f6fa31e4bc0,
    info=info@entry=0x7f6fa31e49d0) at src/engine.c:420
#13 0x0000000000407e42 in game_play (w=w@entry=0xf25f00, g=g@entry=0x7f6fa31e4db0, o=o@entry=0x6e4d60 <options>, engines=engines@entry=0x7f6fa31e4e70,
    eo=eo@entry=0x7f6fa31e4d60, reverse=<optimized out>) at src/game.c:318
#14 0x000000000040a10d in thread_start (arg=0xf25f00) at src/main.c:119
#15 0x00007f6fa48bedd5 in start_thread (arg=0x7f6fa31e5700) at pthread_create.c:307
#16 0x00007f6fa42e602d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

With in the main thread:

[Switching to thread 6 (Thread 0x7f6fa4cdc740 (LWP 15145))]
#0  0x00007f6fa42d77a0 in __close_nocancel () at ../sysdeps/unix/syscall-template.S:81
81      T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) bt
#0  0x00007f6fa42d77a0 in __close_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f6fa426311c in _IO_new_file_close_it (fp=fp@entry=0xf24f80) at fileops.c:179
#2  0x00007f6fa4256138 in _IO_new_fclose (fp=0xf24f80) at iofclose.c:58
#3  0x000000000040c648 in worker_destroy (w=0xf25f68) at src/workers.c:91
#4  0x000000000040a404 in main_destroy () at src/main.c:37
#5  0x00007f6fa4221c29 in __run_exit_handlers (status=status@entry=1, listp=0x7f6fa45ae6c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#6  0x00007f6fa4221c77 in __GI_exit (status=status@entry=1) at exit.c:99
#7  0x000000000040a931 in main (argc=<optimized out>, argv=<optimized out>) at src/main.c:189

Looks like there's a race when worker_destroy() is called while the worker thread is still running. w->log was set to NULL while it was still being used. This happens on DIE() as we don't pthread_join() the workers.

I think the proper way to fix this would be to implement (using poll/select or whatever) a version of engine_readln() that checks the deadline of the current thread, instead of checking the deadlines in the main thread.

lucasart commented 3 years ago

Thank you for the detailled bug report. I'm a bit reluctant to using poll/select in the worker threads though, as this would create further complexity. But I don't have an alternative solution at this point. Will think about it. Now really a high priority bug to fix, because DIE() is fatal anwyay, and the OS cleans up our mess anyway ;-)

sderr commented 3 years ago

Yes, I understand, indeed if it was a simple fix I would have sent a pull request rather than an issue.

Regarding the priority of this bug however, I think it would be nice to be able to treat that engine failure as a single defeat rather than aborting the whole tournament.

lucasart commented 3 years ago

I looked into poll, but it's impractical because:

Perhaps this is really the way to do it. But for a simple project like c-chess-cli, it seems overkill.