mourabitiziyad / Monza-Chess

1 stars 0 forks source link

Segmentation fault due to race condition ? #1

Open tissatussa opened 1 year ago

tissatussa commented 1 year ago

hi, i just stumbled upon your simple UCI engine .. i'm on Linux (64-bit Xubuntu 22.04) and i managed to compile your (2) CPP source files - so it seems .. but i get a "Segmentation fault" when running the binary (i editted-out the diagrams to keep the logs short) :

[..intro UCI commands..]

go wtime 70000 btime 70000 winc 10000 binc 10000
time: 2283 start: 3436752225 stop: 3436764508 depth: 64 timeset: 1
info score cp 52 depth 1 nodes 24 time 0 pv e2e4 
info score cp -17 depth 2 nodes 156 time 0 pv a2a4 e7e5 
info score cp 72 depth 3 nodes 537 time 1 pv d2d4 d7d5 e2e4 
info score cp -31 depth 4 nodes 2535 time 4 pv e2e4 e7e5 c2c4 d7d5 
info score cp 78 depth 5 nodes 5157 time 6 pv e2e4 e7e5 c2c4 f8b4 a2a4 
info score cp 3 depth 6 nodes 27170 time 24 pv e2e4 e7e5 d2d4 e5d4 d1d4 d7d5 
info score cp 22 depth 7 nodes 98889 time 85 pv c2c4 d7d5 c4d5 d8d5 a2a4 e7e5 d2d4 
info score cp 0 depth 8 nodes 477277 time 433 pv c2c4 c7c5 d2d4 c5d4 d1d4 d7d5 c4d5 d8d5 
Segmentation fault (core dumped)

however, during compilation no errors occured, only a few warnings !?


While investigating the problem (not being into CPP, but just using common sense and logic), a strange thing happened : when i run the binary with a debugger (i used GDB and LLDB) the binary runs fine, without errors !? This was a surprise, one would think it would be the other way around ..

i found this text part at https://www.geeksforgeeks.org/segmentation-fault-c-cpp/ :

Overall, the cause of the segmentation fault is accessing the memory that does not belong to you in that space. As long as we avoid doing that, we can avoid the segmentation fault. If you cannot find the source of the error even after doing it, it is recommended to use a debugger as it directly leads to the point of error in the program.

Well, i my case this isn't so ..

Then i found a (rather old) forum entry dealing with the same issue :

segfault only when NOT using debugger https://stackoverflow.com/questions/4628521/segfault-only-when-not-using-debugger

The comment at the bottom of that page might be useful :

By debugging it you are changing the environment that it is running in. It sounds like you are dealing with some sort of race condition, and by debugging it things are scheduled slightly differently so you don't encounter the issue. That, or things are being stored in a slightly different way so it doesn't occur. Are you able to put some debugging output in the code to assist in figuring out the problem? That may have less of an impact and allow you to find your issue.

This seems exactly what's happening in my case.


Here's my terminal output, doing 2 compiles with slightly different commands, and running them in the GDB debugger .. the second try (simple-compile2) was done with the option -O3 which seems to give an optimized binary and THEREFOR (also) fails when running it with GDB !? :

$ g++-12 *.cpp -Wall -o monza-simple-compile
main.cpp: In function ‘Bitboard find_magic(int, int, int)’:
main.cpp:343:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]
  343 |     for (int idx = 0; idx < occupancy_idx; idx++){
      |                       ~~~~^~~~~~~~~~~~~~~
main.cpp:359:49: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]
  359 |         for(index = 0, fail = 0; !fail && index < occupancy_idx; index++) {
      |                                           ~~~~~~^~~~~~~~~~~~~~~
main.cpp: In function ‘int make_move(int, int)’:
main.cpp:963:7: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  963 |     } else
      |       ^~~~
main.cpp:966:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’
  966 |         return 0;
      |         ^~~~~~
$ gdb ./monza-simple-compile 
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./monza-simple-compile...
(No debugging symbols found in ./monza-simple-compile)
(gdb) run
Starting program: /home/roelof/Compiled/Monza-Chess/Monza/monza-simple-compile 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
id name Monza
id author Ziyad Mourabiti
uciok
ucinewgame
position startpos
isready
readyok
go wtime 70000 btime 70000 winc 10000 binc 10000
time: 2283 start: -853334083 stop: -853321800 depth: 64 timeset: 1
info score cp 52 depth 1 nodes 24 time 0 pv e2e4 
info score cp -17 depth 2 nodes 156 time 2 pv a2a4 e7e5 
info score cp 72 depth 3 nodes 537 time 4 pv d2d4 d7d5 e2e4 
info score cp -31 depth 4 nodes 2535 time 9 pv e2e4 e7e5 c2c4 d7d5 
info score cp 78 depth 5 nodes 5157 time 15 pv e2e4 e7e5 c2c4 f8b4 a2a4 
info score cp 3 depth 6 nodes 27170 time 68 pv e2e4 e7e5 d2d4 e5d4 d1d4 d7d5 
info score cp 22 depth 7 nodes 98889 time 249 pv c2c4 d7d5 c4d5 d8d5 a2a4 e7e5 d2d4 
info score cp 0 depth 8 nodes 477270 time 1353 pv c2c4 c7c5 d2d4 c5d4 d1d4 d7d5 c4d5 d8d5 
info score cp 46 depth 9 nodes 1169948 time 3687 pv c2c4 f7f5 d2d4 c7c5 d4c5 e7e5 d1a4 f8c5 a4a7 
info score cp 0 depth 10 nodes 3444737 time 12291 pv d2d4 d7d5 b1c3 g8f6 e2e3 a7a5 e3e4 d5e4 f1b5 c7c6 c3e4 
bestmove d2d4
quit
[Inferior 1 (process 1370433) exited normally]
(gdb) exit
$ g++-12 *.cpp -Wall -O3 -o monza-simple-compile2
main.cpp: In function ‘Bitboard find_magic(int, int, int)’:
main.cpp:343:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]
  343 |     for (int idx = 0; idx < occupancy_idx; idx++){
      |                       ~~~~^~~~~~~~~~~~~~~
main.cpp:359:49: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]
  359 |         for(index = 0, fail = 0; !fail && index < occupancy_idx; index++) {
      |                                           ~~~~~~^~~~~~~~~~~~~~~
main.cpp: In function ‘int make_move(int, int)’:
main.cpp:963:7: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]
  963 |     } else
      |       ^~~~
main.cpp:966:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’
  966 |         return 0;
      |         ^~~~~~
$ gdb ./monza-simple-compile2
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./monza-simple-compile2...
(No debugging symbols found in ./monza-simple-compile2)
(gdb) run
Starting program: /home/roelof/Compiled/Monza-Chess/Monza/monza-simple-compile2 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
id name Monza
id author Ziyad Mourabiti
uciok
ucinewgame
position startpos
isready
readyok
go wtime 70000 btime 70000 winc 10000 binc 10000
time: 2283 start: -853188891 stop: -853176608 depth: 64 timeset: 1
info score cp 52 depth 1 nodes 24 time 1 pv e2e4 
info score cp -17 depth 2 nodes 156 time 1 pv a2a4 e7e5 
info score cp 72 depth 3 nodes 537 time 2 pv d2d4 d7d5 e2e4 
info score cp -31 depth 4 nodes 2535 time 9 pv e2e4 e7e5 c2c4 d7d5 
info score cp 78 depth 5 nodes 5157 time 20 pv e2e4 e7e5 c2c4 f8b4 a2a4 
info score cp 3 depth 6 nodes 27170 time 58 pv e2e4 e7e5 d2d4 e5d4 d1d4 d7d5 
info score cp 22 depth 7 nodes 98889 time 160 pv c2c4 d7d5 c4d5 d8d5 a2a4 e7e5 d2d4 
info score cp 0 depth 8 nodes 477277 time 537 pv c2c4 c7c5 d2d4 c5d4 d1d4 d7d5 c4d5 d8d5 

Program received signal SIGSEGV, Segmentation fault.
0x0000555555557441 in make_move(int, int) [clone .part.0] ()
(gdb) exit
A debugging session is active.

  Inferior 1 [process 1370675] will be killed.

Quit anyway? (y or n) y

Both GDB and LLDB show something's wrong within the function make_move(int int) - however, i have no clue .. do you ?

mourabitiziyad commented 1 year ago

Hey! Sorry for not being able to investigate the issue at the current moment (exams 😅)

Did you try running the engine with no transposition table? Or at least try to tweak its memory settings.

Another thing you could do is play with compiler optimization flags.

I can only look at this properly in a week or two but didn't wanna keep you with no response.

mourabitiziyad commented 1 year ago

hi, i just stumbled upon your simple UCI engine .. i'm on Linux (64-bit Xubuntu 22.04) and i managed to compile your (2) CPP source files - so it seems .. but i get a "Segmentation fault" when running the binary (i editted-out the diagrams to keep the logs short) :


[..intro UCI commands..]

go wtime 70000 btime 70000 winc 10000 binc 10000

time: 2283 start: 3436752225 stop: 3436764508 depth: 64 timeset: 1

info score cp 52 depth 1 nodes 24 time 0 pv e2e4 

info score cp -17 depth 2 nodes 156 time 0 pv a2a4 e7e5 

info score cp 72 depth 3 nodes 537 time 1 pv d2d4 d7d5 e2e4 

info score cp -31 depth 4 nodes 2535 time 4 pv e2e4 e7e5 c2c4 d7d5 

info score cp 78 depth 5 nodes 5157 time 6 pv e2e4 e7e5 c2c4 f8b4 a2a4 

info score cp 3 depth 6 nodes 27170 time 24 pv e2e4 e7e5 d2d4 e5d4 d1d4 d7d5 

info score cp 22 depth 7 nodes 98889 time 85 pv c2c4 d7d5 c4d5 d8d5 a2a4 e7e5 d2d4 

info score cp 0 depth 8 nodes 477277 time 433 pv c2c4 c7c5 d2d4 c5d4 d1d4 d7d5 c4d5 d8d5 

Segmentation fault (core dumped)

however, during compilation no errors occured, only a few warnings !?


While investigating the problem (not being into CPP, but just using common sense and logic), a strange thing happened : when i run the binary with a debugger (i used GDB and LLDB) the binary runs fine, without errors !? This was a surprise, one would think it would be the other way around ..

i found this text part at https://www.geeksforgeeks.org/segmentation-fault-c-cpp/ :

Overall, the cause of the segmentation fault is accessing the memory that does not belong to you in that space. As long as we avoid doing that, we can avoid the segmentation fault. If you cannot find the source of the error even after doing it, it is recommended to use a debugger as it directly leads to the point of error in the program.

Well, i my case this isn't so ..

Then i found a (rather old) forum entry dealing with the same issue :

segfault only when NOT using debugger

https://stackoverflow.com/questions/4628521/segfault-only-when-not-using-debugger

The comment at the bottom of that page might be useful :

By debugging it you are changing the environment that it is running in. It sounds like you are dealing with some sort of race condition, and by debugging it things are scheduled slightly differently so you don't encounter the issue. That, or things are being stored in a slightly different way so it doesn't occur. Are you able to put some debugging output in the code to assist in figuring out the problem? That may have less of an impact and allow you to find your issue.

This seems exactly what's happening in my case.


Here's my terminal output, doing 2 compiles with slightly different commands, and running them in the GDB debugger .. the second try (simple-compile2) was done with the option -O3 which seems to give an optimized binary and THEREFOR (also) fails when running it with GDB !? :


$ g++-12 *.cpp -Wall -o monza-simple-compile

main.cpp: In function ‘Bitboard find_magic(int, int, int)’:

main.cpp:343:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]

  343 |     for (int idx = 0; idx < occupancy_idx; idx++){

      |                       ~~~~^~~~~~~~~~~~~~~

main.cpp:359:49: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]

  359 |         for(index = 0, fail = 0; !fail && index < occupancy_idx; index++) {

      |                                           ~~~~~~^~~~~~~~~~~~~~~

main.cpp: In function ‘int make_move(int, int)’:

main.cpp:963:7: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]

  963 |     } else

      |       ^~~~

main.cpp:966:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’

  966 |         return 0;

      |         ^~~~~~

$ gdb ./monza-simple-compile 

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1

Copyright (C) 2022 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.

Type "show copying" and "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<https://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from ./monza-simple-compile...

(No debugging symbols found in ./monza-simple-compile)

(gdb) run

Starting program: /home/roelof/Compiled/Monza-Chess/Monza/monza-simple-compile 

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

id name Monza

id author Ziyad Mourabiti

uciok

ucinewgame

position startpos

isready

readyok

go wtime 70000 btime 70000 winc 10000 binc 10000

time: 2283 start: -853334083 stop: -853321800 depth: 64 timeset: 1

info score cp 52 depth 1 nodes 24 time 0 pv e2e4 

info score cp -17 depth 2 nodes 156 time 2 pv a2a4 e7e5 

info score cp 72 depth 3 nodes 537 time 4 pv d2d4 d7d5 e2e4 

info score cp -31 depth 4 nodes 2535 time 9 pv e2e4 e7e5 c2c4 d7d5 

info score cp 78 depth 5 nodes 5157 time 15 pv e2e4 e7e5 c2c4 f8b4 a2a4 

info score cp 3 depth 6 nodes 27170 time 68 pv e2e4 e7e5 d2d4 e5d4 d1d4 d7d5 

info score cp 22 depth 7 nodes 98889 time 249 pv c2c4 d7d5 c4d5 d8d5 a2a4 e7e5 d2d4 

info score cp 0 depth 8 nodes 477270 time 1353 pv c2c4 c7c5 d2d4 c5d4 d1d4 d7d5 c4d5 d8d5 

info score cp 46 depth 9 nodes 1169948 time 3687 pv c2c4 f7f5 d2d4 c7c5 d4c5 e7e5 d1a4 f8c5 a4a7 

info score cp 0 depth 10 nodes 3444737 time 12291 pv d2d4 d7d5 b1c3 g8f6 e2e3 a7a5 e3e4 d5e4 f1b5 c7c6 c3e4 

bestmove d2d4

quit

[Inferior 1 (process 1370433) exited normally]

(gdb) exit

$ g++-12 *.cpp -Wall -O3 -o monza-simple-compile2

main.cpp: In function ‘Bitboard find_magic(int, int, int)’:

main.cpp:343:27: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]

  343 |     for (int idx = 0; idx < occupancy_idx; idx++){

      |                       ~~~~^~~~~~~~~~~~~~~

main.cpp:359:49: warning: comparison of integer expressions of different signedness: ‘int’ and ‘Bitboard’ {aka ‘long long unsigned int’} [-Wsign-compare]

  359 |         for(index = 0, fail = 0; !fail && index < occupancy_idx; index++) {

      |                                           ~~~~~~^~~~~~~~~~~~~~~

main.cpp: In function ‘int make_move(int, int)’:

main.cpp:963:7: warning: this ‘else’ clause does not guard... [-Wmisleading-indentation]

  963 |     } else

      |       ^~~~

main.cpp:966:9: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the ‘else’

  966 |         return 0;

      |         ^~~~~~

$ gdb ./monza-simple-compile2

GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1

Copyright (C) 2022 Free Software Foundation, Inc.

License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law.

Type "show copying" and "show warranty" for details.

This GDB was configured as "x86_64-linux-gnu".

Type "show configuration" for configuration details.

For bug reporting instructions, please see:

<https://www.gnu.org/software/gdb/bugs/>.

Find the GDB manual and other documentation resources online at:

    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".

Type "apropos word" to search for commands related to "word"...

Reading symbols from ./monza-simple-compile2...

(No debugging symbols found in ./monza-simple-compile2)

(gdb) run

Starting program: /home/roelof/Compiled/Monza-Chess/Monza/monza-simple-compile2 

[Thread debugging using libthread_db enabled]

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

id name Monza

id author Ziyad Mourabiti

uciok

ucinewgame

position startpos

isready

readyok

go wtime 70000 btime 70000 winc 10000 binc 10000

time: 2283 start: -853188891 stop: -853176608 depth: 64 timeset: 1

info score cp 52 depth 1 nodes 24 time 1 pv e2e4 

info score cp -17 depth 2 nodes 156 time 1 pv a2a4 e7e5 

info score cp 72 depth 3 nodes 537 time 2 pv d2d4 d7d5 e2e4 

info score cp -31 depth 4 nodes 2535 time 9 pv e2e4 e7e5 c2c4 d7d5 

info score cp 78 depth 5 nodes 5157 time 20 pv e2e4 e7e5 c2c4 f8b4 a2a4 

info score cp 3 depth 6 nodes 27170 time 58 pv e2e4 e7e5 d2d4 e5d4 d1d4 d7d5 

info score cp 22 depth 7 nodes 98889 time 160 pv c2c4 d7d5 c4d5 d8d5 a2a4 e7e5 d2d4 

info score cp 0 depth 8 nodes 477277 time 537 pv c2c4 c7c5 d2d4 c5d4 d1d4 d7d5 c4d5 d8d5 

Program received signal SIGSEGV, Segmentation fault.

0x0000555555557441 in make_move(int, int) [clone .part.0] ()

(gdb) exit

A debugging session is active.

  Inferior 1 [process 1370675] will be killed.

Quit anyway? (y or n) y

Both GDB and LLDB show something's wrong within the function make_move(int int) - however, i have no clue .. do you ?

The reason behind my Transposition Tables assumption is that the code does not break right away but only after a few iterations meaning that it is either trying to store moves beyond the memory allowance, or its not able to access that amount of memory

tissatussa commented 1 year ago

OK, i will try to solve it .. good luck with the exams !

tissatussa commented 1 year ago

thanks, that way i managed to compile a good binary ! i just set

# define t_size 25600000 // about 16*32 = 512 mb

and there's no more error, with or without running in a debugger .. even go infinite goes well for half an hour .. now i can let Monza play matches in CuteChess.