Closed wonderingabout closed 5 years ago
wow its actually reproductible, happened again in this game
https://online-go.com/game/17708647
had to manually restart to fix
log of game 2 crash
May 03 20:59:36 game.js:490 [Game 17708647] Connecting to game.
May 03 20:59:37 game.js:490 [Game 17708647] gamedata B RoyalZero [19x19]
May 03 20:59:37 game.js:490 [Game 17708647] makeMove 0 is 0 !== 0 ?
May 03 20:59:37 bot.js:214 [25704] Starting /home/amd2019/PhoenixGo/bazel-bin/mcts/mcts_main --gtp --config_path=/home/amd2019/PhoenixGo/etc/mcts_1gpu.conf --v=0
May 03 20:59:37 game.js:490 [Game 17708647] Starting new bot process [25704]
May 03 20:59:37 game.js:490 [Game 17708647] State loading for new bot
May 03 20:59:37 bot.js:214 [25704] >>> boardsize 19
May 03 20:59:37 bot.js:214 [25704] >>> clear_board
May 03 20:59:37 bot.js:214 [25704] >>> komi 7.5
May 03 20:59:37 bot.js:214 [25704] Generating move for game 17708647
May 03 20:59:37 game.js:490 [Game 17708647] genmove black
May 03 20:59:37 bot.js:214 [25704] >>> genmove black
May 03 20:59:37 game.js:490 [Game 17708647] clock: {"game_id":17708647,"current_player":592558,"black_player_id":592558,"white_player_id":484523,"title":"10000 vs 3200 test","last_move":1556909976759,"expiration":1556910276759,"black_time":{"thinking_time":300,"skip_bonus":false},"white_time":{"thinking_time":300,"skip_bonus":false},"start_mode":true,"now":1556909976959}
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.075422: I model/zero_model.cc:72] Read checkpoint state succ
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.093378: I model/zero_model.cc:80] Read meta graph succ
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.212965: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.213570: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7335
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: pciBusID: 0000:1c:00.0
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: totalMemory: 5.93GiB freeMemory: 3.68GiB
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.213585: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.306520: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.306557: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.306563: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.306770: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3035 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:1c:00.0, compute capability: 6.1)
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.307806: I model/zero_model.cc:100] Create session succ
May 03 20:59:37 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:37.356809: I model/zero_model.cc:107] Create graph succ
May 03 20:59:37 game.js:490 [Game 17708647] gamedata B RoyalZero [19x19]
May 03 20:59:37 game.js:490 [Game 17708647] makeMove 0 is 0 !== 0 ?
May 03 20:59:37 bot.js:214 [25704] Generating move for game 17708647
May 03 20:59:37 game.js:490 [Game 17708647] genmove black
May 03 20:59:37 bot.js:214 [25704] >>> genmove black
May 03 20:59:37 game.js:490 [Game 17708647] clock: {"game_id":17708647,"current_player":592558,"black_player_id":592558,"white_player_id":484523,"title":"10000 vs 3200 test","last_move":1556909976759,"expiration":1556910276759,"black_time":{"thinking_time":300,"skip_bonus":false},"white_time":{"thinking_time":300,"skip_bonus":false},"start_mode":true,"now":1556909977613}
May 03 20:59:38 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:38.145466: I model/zero_model.cc:119] Load checkpoint succ
May 03 20:59:38 !!!!! bot.js:222 [25704] stderr: 2019-05-03 20:59:38.459308: I external/org_tensorflow/tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
May 03 20:59:39 bot.js:214 [25704] <<< =
May 03 20:59:39 bot.js:214 [25704] <<< =
May 03 20:59:39 bot.js:214 [25704] <<< =
May 03 21:00:21 !!!!! bot.js:222 [25704] stderr: 1th move(b): pd, winrate=44.063831%, N=924, Q=-0.118723, p=0.080226, v=-0.113951, cost 42105.917969ms, sims=10032, height=12, avg_height=6.066105, global_step=639200
May 03 21:00:21 bot.js:214 [25704] <<< = Q4
May 03 21:00:21 game.js:490 [Game 17708647] Playing q4 { x: 15, y: 15, text: 'q4', resign: false, pass: false }
May 03 21:00:21 connection.js:1391 # active_game: {"id":17708647,"phase":"play","name":"10000 vs 3200 test","player_to_move":484523,"width":19,"height":19,"move_number":1,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":19,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"volatility":0.060090145513783985,"games_played":23}}},"white":{"username":"RoyalZero","id":484523,"rank":25,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":85.7766431653554,"rating":2818.0212415578553,"volatility":0.06260142409950183,"games_played":6201}}},"time_per_move":63}
May 03 21:00:21 game.js:490 [Game 17708647] clock: {"game_id":17708647,"current_player":484523,"black_player_id":592558,"white_player_id":484523,"title":"10000 vs 3200 test","last_move":1556910021794,"expiration":1556910321794,"black_time":{"thinking_time":300,"skip_bonus":false},"white_time":{"thinking_time":300,"skip_bonus":false},"now":1556910021795}
May 03 21:00:21 game.js:490 [Game 17708647] game/17708647/move: { game_id: 17708647, move_number: 1, move: [ 15, 15, 45794 ] }
May 03 21:00:21 game.js:490 [Game 17708647] Move 1 Ignoring our own move 1
May 03 21:00:26 connection.js:1391 # active_game: {"id":17708647,"phase":"play","name":"10000 vs 3200 test","player_to_move":592558,"width":19,"height":19,"move_number":2,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":19,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"volatility":0.060090145513783985,"games_played":23}}},"white":{"username":"RoyalZero","id":484523,"rank":25,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":85.7766431653554,"rating":2818.0212415578553,"volatility":0.06260142409950183,"games_played":6201}}},"time_per_move":63}
May 03 21:00:26 connection.js:1391 # Connected to game 17708647 already
May 03 21:00:26 game.js:490 [Game 17708647] Move 1 clock: {"game_id":17708647,"current_player":592558,"black_player_id":592558,"white_player_id":484523,"title":"10000 vs 3200 test","last_move":1556910026765,"expiration":1556910326765,"black_time":{"thinking_time":300,"skip_bonus":false},"white_time":{"thinking_time":300,"skip_bonus":false},"now":1556910026766}
May 03 21:00:26 game.js:490 [Game 17708647] Move 1 game/17708647/move: { game_id: 17708647, move_number: 2, move: [ 3, 3, 4970 ] }
May 03 21:00:26 game.js:490 [Game 17708647] Move 2 Got d16
May 03 21:00:26 bot.js:214 [25704] Calling sendMove with d16
May 03 21:00:26 bot.js:214 [25704] >>> play white d16
May 03 21:00:26 game.js:490 [Game 17708647] Move 2 makeMove 2 is 2 !== 2 ?
May 03 21:00:26 bot.js:214 [25704] Generating move for game 17708647
May 03 21:00:26 game.js:490 [Game 17708647] Move 2 genmove black
May 03 21:00:26 bot.js:214 [25704] >>> genmove black
May 03 21:01:02 !!!!! bot.js:222 [25704] stderr: 3th move(b): dp, winrate=83.961624%, N=1261, Q=0.679232, p=0.093162, v=0.627675, cost 40225.292969ms, sims=10032, height=11, avg_height=5.029628, global_step=639200
May 03 21:01:02 bot.js:214 [25704] <<< = D16
May 03 21:01:02 game.js:490 [Game 17708647] Move 2 Playing d16 { x: 3, y: 3, text: 'd16', resign: false, pass: false }
May 03 21:01:02 !!!!! bot.js:222 [25704] stderr: F0503 21:01:02.230954 25704 mcts_engine.cc:154] Check failed: ret == 0 (-1 vs. 0) Move: failed, dp, ret-1
May 03 21:01:02 !!!!! bot.js:222 [25704] stderr: *** Check failure stack trace: ***
May 03 21:01:02 !!!!! bot.js:222 [25704] stderr: *** Aborted at 1556910062 (unix time) try "date -d @1556910062" if you are using GNU date ***
May 03 21:01:02 !!!!! bot.js:222 [25704] stderr: PC: @ 0x0 (unknown)
May 03 21:01:02 bot.js:214 [25704] Bot exited
^C
amd2019@amd2019-MS-7B86:~/PhoenixGo/etc$ sudo nvidia-smi -pm 1 && sudo nvidia-smi -pl 75 && nvidia-smi -i 0 -q -d POWER,CLOCK && cd ~/PhoenixGo/etc && sudo nano mcts_1gpu.conf && sudo node /usr/lib/node_modules/gtp2ogs/gtp2ogs.js --username meta-金毛测试-20b --debug --maxconnectedgamesperuser 1 --maxconnectedgames 1 --boardsizes 19 --noclock --persist --nopause --fakerank 7d --maxhandicap 0 --komis 7.5,automatic --minrank 4k --unrankedonly --timecontrols fischer,byoyomi --speeds live --minmaintimelive 60 --maxmaintimelive 600 --minperiodtimelive 30 --maxperiodtimelive 180 --maxperiodslive 5 --greeting "hi, now using enable background search (pondering) for faster play, and the new tensorflow 1.13.1 batch size 16 for faster play, this bot uses the original phoenixgo engine provided by tencent https://github.com/Tencent/PhoenixGo 20 blocks version , not leela zero, have fun :)" --rejectnew --rejectnewmsg "doing some testing for today, see you next time :) " --farewell "thank you for the game, hope you enjoyed and/or learned :) there is victory in defeat :) if you want my bot to play a custom opening position against any bot, send it to me here (forked games) https://online-go.com/review/363869" -- /home/amd2019/PhoenixGo/bazel-bin/mcts/mcts_main --gtp --config_path=/home/amd2019/PhoenixGo/etc/mcts_1gpu.conf --v=0
Persistence mode is already Enabled for GPU 00000000:1C:00.0.
All done.
Power limit for GPU 00000000:1C:00.0 was set to 75.00 W from 75.00 W.
All done.
==============NVSMI LOG==============
Timestamp : Fri May 3 21:03:40 2019
Driver Version : 410.104
CUDA Version : 10.0
Attached GPUs : 1
GPU 00000000:1C:00.0
Power Readings
Power Management : Supported
Power Draw : 28.54 W
Power Limit : 75.00 W
Default Power Limit : 120.00 W
Enforced Power Limit : 75.00 W
Min Power Limit : 60.00 W
Max Power Limit : 140.00 W
Power Samples
Duration : 2.37 sec
Number of Samples : 119
Max : 28.93 W
Min : 28.25 W
Avg : 28.45 W
Clocks
Graphics : 1518 MHz
SM : 1518 MHz
Memory : 4006 MHz
Video : 1366 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 1923 MHz
SM : 1923 MHz
Memory : 4004 MHz
Video : 1708 MHz
Max Customer Boost Clocks
Graphics : N/A
SM Clock Samples
Duration : 127.73 sec
Number of Samples : 100
Max : 1835 MHz
Min : 139 MHz
Avg : 1264 MHz
Memory Clock Samples
Duration : 127.73 sec
Number of Samples : 100
Max : 4006 MHz
Min : 405 MHz
Avg : 3045 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
You are using gtp2ogs version 6.0
- For changelog or latest devel updates, please visit https://github.com/online-go/gtp2ogs/tree/devel
May 03 21:03:42 connection.js:1391 # Connecting to https://online-go.com:443
May 03 21:03:42 connection.js:1391 # Connected
May 03 21:03:42 connection.js:1391 # Bot is username: meta-金毛测试-20b
May 03 21:03:42 connection.js:1391 # Bot is user id: 592558
May 03 21:03:43 connection.js:1391 # active_game: {"id":17708647,"phase":"play","name":"10000 vs 3200 test","player_to_move":592558,"width":19,"height":19,"move_number":2,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":19,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"volatility":0.060090145513783985,"games_played":23}}},"white":{"username":"RoyalZero","id":484523,"rank":25,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":85.7766431653554,"rating":2818.0212415578553,"volatility":0.06260142409950183,"games_played":6201}}},"time_per_move":63}
May 03 21:03:43 game.js:490 [Game 17708647] Connecting to game.
May 03 21:03:43 game.js:490 [Game 17708647] Move 2 gamedata B RoyalZero [19x19]
May 03 21:03:43 game.js:490 [Game 17708647] Move 2 makeMove 2 is 2 !== 2 ?
May 03 21:03:43 bot.js:214 [27738] Starting /home/amd2019/PhoenixGo/bazel-bin/mcts/mcts_main --gtp --config_path=/home/amd2019/PhoenixGo/etc/mcts_1gpu.conf --v=0
May 03 21:03:43 game.js:490 [Game 17708647] Move 2 Starting new bot process [27738]
May 03 21:03:43 game.js:490 [Game 17708647] Move 2 State loading for new bot
May 03 21:03:43 bot.js:214 [27738] >>> boardsize 19
May 03 21:03:43 bot.js:214 [27738] >>> clear_board
May 03 21:03:43 bot.js:214 [27738] >>> komi 7.5
May 03 21:03:43 bot.js:214 [27738] >>> play black q4
May 03 21:03:43 bot.js:214 [27738] >>> play white d16
May 03 21:03:43 bot.js:214 [27738] Generating move for game 17708647
May 03 21:03:43 game.js:490 [Game 17708647] Move 2 genmove black
May 03 21:03:43 bot.js:214 [27738] >>> genmove black
May 03 21:03:43 game.js:490 [Game 17708647] Move 2 clock: {"game_id":17708647,"current_player":592558,"black_player_id":592558,"white_player_id":484523,"title":"10000 vs 3200 test","last_move":1556910026765,"expiration":1556910326765,"black_time":{"thinking_time":300,"skip_bonus":false},"white_time":{"thinking_time":300,"skip_bonus":false},"now":1556910223155}
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.277033: I model/zero_model.cc:72] Read checkpoint state succ
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.297692: I model/zero_model.cc:80] Read meta graph succ
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.413360: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.413878: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7335
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: pciBusID: 0000:1c:00.0
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: totalMemory: 5.93GiB freeMemory: 3.67GiB
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.413893: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.515049: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.515081: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.515087: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.515286: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3035 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:1c:00.0, compute capability: 6.1)
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.516445: I model/zero_model.cc:100] Create session succ
May 03 21:03:43 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:43.575703: I model/zero_model.cc:107] Create graph succ
May 03 21:03:44 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:44.374243: I model/zero_model.cc:119] Load checkpoint succ
May 03 21:03:44 !!!!! bot.js:222 [27738] stderr: 2019-05-03 21:03:44.677459: I external/org_tensorflow/tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
May 03 21:03:45 bot.js:214 [27738] <<< =
May 03 21:03:45 bot.js:214 [27738] <<< =
May 03 21:03:45 bot.js:214 [27738] <<< =
May 03 21:03:45 !!!!! bot.js:222 [27738] stderr: 1th move(b): pd, winrate=43.994141%, N=1, Q=-0.120117, p=0.077095, v=-0.120119, cost 145.624573ms, sims=32, height=2, avg_height=2.000000, global_step=639200
May 03 21:03:45 bot.js:214 [27738] <<< =
May 03 21:03:46 !!!!! bot.js:222 [27738] stderr: 2th move(w): dp, winrate=55.165863%, N=1, Q=0.103317, p=0.228309, v=0.103328, cost 152.800751ms, sims=29, height=2, avg_height=2.000000, global_step=639200
May 03 21:03:46 bot.js:214 [27738] <<< =
May 03 21:04:26 !!!!! bot.js:222 [27738] stderr: 3th move(b): cd, winrate=44.162975%, N=1939, Q=-0.116740, p=0.162921, v=-0.112885, cost 40846.269531ms, sims=10032, height=31, avg_height=5.446719, global_step=639200
May 03 21:04:26 bot.js:214 [27738] <<< = C4
May 03 21:04:26 game.js:490 [Game 17708647] Move 2 Playing c4 { x: 2, y: 15, text: 'c4', resign: false, pass: false }
May 03 21:04:27 connection.js:1391 # active_game: {"id":17708647,"phase":"play","name":"10000 vs 3200 test","player_to_move":484523,"width":19,"height":19,"move_number":3,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":19,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"volatility":0.060090145513783985,"games_played":23}}},"white":{"username":"RoyalZero","id":484523,"rank":25,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":85.7766431653554,"rating":2818.0212415578553,"volatility":0.06260142409950183,"games_played":6201}}},"time_per_move":63}
May 03 21:04:27 game.js:490 [Game 17708647] Move 2 clock: {"game_id":17708647,"current_player":484523,"black_player_id":592558,"white_player_id":484523,"title":"10000 vs 3200 test","last_move":1556910267062,"expiration":1556910567062,"black_time":{"thinking_time":119.703,"skip_bonus":false},"white_time":{"thinking_time":300,"skip_bonus":false},"now":1556910267063}
May 03 21:04:27 game.js:490 [Game 17708647] Move 2 game/17708647/move: { game_id: 17708647, move_number: 3, move: [ 2, 15, 240297 ] }
May 03 21:04:27 game.js:490 [Game 17708647] Move 3 Ignoring our own move 3
May 03 21:04:30 game.js:490 [Game 17708647] Move 3 clock: {"game_id":17708647,"current_player":592558,"black_player_id":592558,"white_player_id":484523,"title":"10000 vs 3200 test","last_move":1556910270097,"expiration":1556910389800,"black_time":{"thinking_time":119.703,"skip_bonus":false},"white_time":{"thinking_time":300,"skip_bonus":false},"now":1556910270097}
May 03 21:04:30 game.js:490 [Game 17708647] Move 3 game/17708647/move: { game_id: 17708647, move_number: 4, move: [ 15, 3, 3035 ] }
May 03 21:04:30 game.js:490 [Game 17708647] Move 4 Got q16
May 03 21:04:30 bot.js:214 [27738] Calling sendMove with q16
May 03 21:04:30 bot.js:214 [27738] >>> play white q16
May 03 21:04:30 game.js:490 [Game 17708647] Move 4 makeMove 4 is 4 !== 4 ?
May 03 21:04:30 bot.js:214 [27738] Generating move for game 17708647
May 03 21:04:30 game.js:490 [Game 17708647] Move 4 genmove black
May 03 21:04:30 bot.js:214 [27738] >>> genmove black
May 03 21:04:30 connection.js:1391 # active_game: {"id":17708647,"phase":"play","name":"10000 vs 3200 test","player_to_move":592558,"width":19,"height":19,"move_number":4,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":19,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"volatility":0.060090145513783985,"games_played":23}}},"white":{"username":"RoyalZero","id":484523,"rank":25,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":85.7766431653554,"rating":2818.0212415578553,"volatility":0.06260142409950183,"games_played":6201}}},"time_per_move":63}
May 03 21:04:30 connection.js:1391 # Connected to game 17708647 already
May 03 21:04:30 !!!!! bot.js:222 [27738] stderr: 4th move(w): pp, winrate=56.105717%, N=2127, Q=0.122114, p=0.679589, v=0.115394, cost 3286.168213ms, sims=816, height=31, avg_height=10.517672, global_step=639200
May 03 21:04:30 bot.js:214 [27738] <<< =
Those are PhoenixGo internal errors that I can see. Is there an issue with gtp2ogs you also saw?
this is what i'm wondering as well
if its only a PhoenixGo specific issue, there's no reason to post it here, but since i dont know if its not gtp2ogs somehow triggering this, i'm posting it as a starter :+1:
the issue does not happen anymore now though :thinking:
The problem shown in this log https://github.com/Tencent/PhoenixGo/issues/93 is that the 'play' move was ignored and genmove happened first and tried to move where the other player already moved. I don't know if this is PhoenixGo internal issue (it did crash after moving) or if somehow gtp2ogs can be in a state where genmove is requested first? I'd need to double check code but I didn't think it'd be possible to issue these commands out of order.
Ok so when we receive a move from the server this is the logic we process, when not in handicap placement time. This code is non-async. https://github.com/online-go/gtp2ogs/blob/2a567321a6f28f64f5f7cf296d9e628a0a8be35d/game.js#L207-L222
sendMove() sends the move to the engine. It's a very simple function, non-async. https://github.com/online-go/gtp2ogs/blob/2a567321a6f28f64f5f7cf296d9e628a0a8be35d/bot.js#L485-L488
They use the command() function which I also think is non-async unless I misunderstand. https://github.com/online-go/gtp2ogs/blob/2a567321a6f28f64f5f7cf296d9e628a0a8be35d/bot.js#L388-L426
Which means we can't be calling makeMove() before sendMove() and thus if sendMove() is in the log first, it had to have been sent to the engine first?
just had the issue yesterday again as JBXLeelaBot made me notice :
https://online-go.com/game/17733237
too bad i didnt save command line log
i'll be running with --v=1
from now on to debug this
since it is now the 3rd time, it will definitely happen again :+1:
since i didnt notice this anymore, closing it for now
got an error again (not sure if it's the same) :
game in : https://online-go.com/game/18783873
Jul 21 15:39:45 game.js:490 [Game 18783873] Connecting to game.
Jul 21 15:39:45 game.js:490 [Game 18783873] gamedata B 00bSlowLeela [19x19]
Jul 21 15:39:45 game.js:490 [Game 18783873] makeMove 0 is 0 !== 0 ?
Jul 21 15:39:45 bot.js:214 [4378] Starting /home/amd2019/PhoenixGo/bazel-bin/mcts/mcts_main --gtp --config_path=/home/amd2019/PhoenixGo/etc/mcts_1gpu.conf
Jul 21 15:39:45 game.js:490 [Game 18783873] Starting new bot process [4378]
Jul 21 15:39:45 game.js:490 [Game 18783873] State loading for new bot
Jul 21 15:39:45 bot.js:214 [4378] >>> boardsize 19
Jul 21 15:39:45 bot.js:214 [4378] >>> clear_board
Jul 21 15:39:45 bot.js:214 [4378] >>> komi 7.5
Jul 21 15:39:45 bot.js:214 [4378] Generating move for game 18783873
Jul 21 15:39:45 game.js:490 [Game 18783873] genmove black
Jul 21 15:39:45 bot.js:214 [4378] >>> genmove black
Jul 21 15:39:45 game.js:490 [Game 18783873] clock: {"game_id":18783873,"current_player":592558,"black_player_id":592558,"white_player_id":640435,"title":"lz elf-v1 -b 1 -t 2 -v 6400","last_move":1563716385067,"expiration":1563718185067,"black_time":{"thinking_time":1800,"skip_bonus":false},"white_time":{"thinking_time":1800,"skip_bonus":false},"start_mode":true,"now":1563716385177}
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.289472: I model/zero_model.cc:72] Read checkpoint state succ
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.308510: I model/zero_model.cc:80] Read meta graph succ
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.432449: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.433002: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7335
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: pciBusID: 0000:1c:00.0
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: totalMemory: 5.93GiB freeMemory: 3.09GiB
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.433015: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.529587: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.529619: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.529624: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.529809: I external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3035 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:1c:00.0, compute capability: 6.1)
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.531047: I model/zero_model.cc:100] Create session succ
Jul 21 15:39:45 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:45.582119: I model/zero_model.cc:107] Create graph succ
Jul 21 15:39:46 game.js:490 [Game 18783873] gamedata B 00bSlowLeela [19x19]
Jul 21 15:39:46 game.js:490 [Game 18783873] makeMove 0 is 0 !== 0 ?
Jul 21 15:39:46 bot.js:214 [4378] Generating move for game 18783873
Jul 21 15:39:46 game.js:490 [Game 18783873] genmove black
Jul 21 15:39:46 bot.js:214 [4378] >>> genmove black
Jul 21 15:39:46 game.js:490 [Game 18783873] clock: {"game_id":18783873,"current_player":592558,"black_player_id":592558,"white_player_id":640435,"title":"lz elf-v1 -b 1 -t 2 -v 6400","last_move":1563716385067,"expiration":1563718185067,"black_time":{"thinking_time":1800,"skip_bonus":false},"white_time":{"thinking_time":1800,"skip_bonus":false},"start_mode":true,"now":1563716385994}
Jul 21 15:39:46 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:46.425765: I model/zero_model.cc:119] Load checkpoint succ
Jul 21 15:39:46 !!!!! bot.js:222 [4378] stderr: 2019-07-21 15:39:46.756999: I external/org_tensorflow/tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Jul 21 15:39:47 bot.js:214 [4378] <<< =
Jul 21 15:39:47 bot.js:214 [4378] <<< =
Jul 21 15:39:47 bot.js:214 [4378] <<< =
Jul 21 15:40:41 !!!!! bot.js:222 [4378] stderr: 1th move(b): pp, winrate=44.085880%, N=582, Q=-0.118282, p=0.080226, v=-0.116534, cost 53929.507812ms, sims=6408, height=10, avg_height=5.671315, global_step=639200
Jul 21 15:40:41 bot.js:214 [4378] <<< = Q16
Jul 21 15:40:41 game.js:490 [Game 18783873] Playing q16 { x: 15, y: 3, text: 'q16', resign: false, pass: false }
Jul 21 15:40:41 connection.js:1391 # active_game: {"id":18783873,"phase":"play","name":"lz elf-v1 -b 1 -t 2 -v 6400","player_to_move":640435,"width":19,"height":19,"move_number":1,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":36.80948311930709,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"games_played":23,"volatility":0.060090145513783985}}},"white":{"username":"00bSlowLeela","id":640435,"rank":34.950459271425665,"professional":false,"accepted":false,"ratings":{"overall":{"rating":2600.9994296562463,"deviation":145.1053194921326,"volatility":0.060348615827214624,"games_played":11}}},"time_per_move":80}
Jul 21 15:40:41 game.js:490 [Game 18783873] game/18783873/move: { game_id: 18783873, move_number: 1, move: [ 15, 3, 56804 ] }
Jul 21 15:40:41 game.js:490 [Game 18783873] Move 1 Ignoring our own move 1
Jul 21 15:40:41 game.js:490 [Game 18783873] Move 1 clock: {"game_id":18783873,"current_player":640435,"black_player_id":592558,"white_player_id":640435,"title":"lz elf-v1 -b 1 -t 2 -v 6400","last_move":1563716441804,"expiration":1563718241804,"black_time":{"thinking_time":1800,"skip_bonus":false},"white_time":{"thinking_time":1800,"skip_bonus":false},"now":1563716441805}
Jul 21 15:40:57 connection.js:1391 # active_game: {"id":18783873,"phase":"play","name":"lz elf-v1 -b 1 -t 2 -v 6400","player_to_move":592558,"width":19,"height":19,"move_number":2,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":36.80948311930709,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"games_played":23,"volatility":0.060090145513783985}}},"white":{"username":"00bSlowLeela","id":640435,"rank":34.950459271425665,"professional":false,"accepted":false,"ratings":{"overall":{"rating":2600.9994296562463,"deviation":145.1053194921326,"volatility":0.060348615827214624,"games_played":11}}},"time_per_move":80}
Jul 21 15:40:57 connection.js:1391 # Connected to game 18783873 already
Jul 21 15:40:57 game.js:490 [Game 18783873] Move 1 game/18783873/move: { game_id: 18783873, move_number: 2, move: [ 3, 15, 15253 ] }
Jul 21 15:40:57 game.js:490 [Game 18783873] Move 2 Got d4
Jul 21 15:40:57 bot.js:214 [4378] Calling sendMove with d4
Jul 21 15:40:57 bot.js:214 [4378] >>> play white d4
Jul 21 15:40:57 game.js:490 [Game 18783873] Move 2 makeMove 2 is 2 !== 2 ?
Jul 21 15:40:57 bot.js:214 [4378] Generating move for game 18783873
Jul 21 15:40:57 game.js:490 [Game 18783873] Move 2 genmove black
Jul 21 15:40:57 bot.js:214 [4378] >>> genmove black
Jul 21 15:40:57 game.js:490 [Game 18783873] Move 2 clock: {"game_id":18783873,"current_player":592558,"black_player_id":592558,"white_player_id":640435,"title":"lz elf-v1 -b 1 -t 2 -v 6400","last_move":1563716457057,"expiration":1563718257057,"black_time":{"thinking_time":1800,"skip_bonus":false},"white_time":{"thinking_time":1800,"skip_bonus":false},"now":1563716457060}
Jul 21 15:41:33 !!!!! bot.js:222 [4378] stderr: 3th move(b): dd, winrate=83.996918%, N=837, Q=0.679938, p=0.093162, v=0.627675, cost 52072.574219ms, sims=6408, height=9, avg_height=4.853092, global_step=639200
Jul 21 15:41:33 bot.js:214 [4378] <<< = D4
Jul 21 15:41:33 game.js:490 [Game 18783873] Move 2 Playing d4 { x: 3, y: 15, text: 'd4', resign: false, pass: false }
Jul 21 15:41:33 !!!!! bot.js:222 [4378] stderr: F0721 15:41:33.943889 4378 mcts_engine.cc:154] Check failed: ret == 0 (-1 vs. 0) Move: failed, dd, ret-1
Jul 21 15:41:33 !!!!! bot.js:222 [4378] stderr: *** Check failure stack trace: ***
Jul 21 15:41:33 !!!!! bot.js:222 [4378] stderr: *** Aborted at 1563716493 (unix time) try "date -d @1563716493" if you are using GNU date ***
Jul 21 15:41:33 !!!!! bot.js:222 [4378] stderr: PC: @ 0x0 (unknown)
Jul 21 15:41:34 bot.js:214 [4378] Bot exited
Jul 21 15:47:15 connection.js:1391 # Dumping status of all connected games
Jul 21 15:47:15 connection.js:1391 # game_id=18783873: black=meta-金毛测试-20b white=00bSlowLeela bot_turn idle_time=378.871s bot.proc.pid=4378 bot.dead=true bot.failed=false
Jul 21 15:47:15 connection.js:1391 # Dump complete
Jul 21 16:00:30 connection.js:1391 # Not accepting new games (rejectnew).
Jul 21 16:00:30 connection.js:1391 # Rejecting challenge from hhht (9d) [19x19] id = 18784119
Jul 21 16:00:30 connection.js:1401 POST online-go.com 443 /api/v1/me/challenges/5311122 { delete: true,
message:
'Currently, this bot is not accepting games, try again later ',
apikey: ',
bot_id: 592558,
player_id: 592558,
jwt:
'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJwbGF5ZXJfaWQiOjU5MjU1OCwiYm90Ijp0cnVlLCJpYXQiOjE1NjM2OTA3MzZ9.YwQQOYSYG-e5roNvtWKmuPOzDby2jcJf1JbBFCYIwdM' }
Jul 21 16:02:15 connection.js:1391 # Dumping status of all connected games
Jul 21 16:02:15 connection.js:1391 # game_id=18783873: black=meta-金毛测试-20b white=00bSlowLeela bot_turn idle_time=1278.872s bot.proc.pid=4378 bot.dead=true bot.failed=false
Jul 21 16:02:15 connection.js:1391 # Dump complete
Jul 21 16:10:57 connection.js:1391 # active_game: {"id":18783873,"phase":"finished","name":"lz elf-v1 -b 1 -t 2 -v 6400","player_to_move":592558,"width":19,"height":19,"move_number":2,"paused":0,"private":false,"black":{"username":"meta-金毛测试-20b","id":592558,"rank":36.80948311930709,"professional":false,"accepted":false,"ratings":{"overall":{"deviation":95.45099520611241,"rating":2760.424660297423,"games_played":23,"volatility":0.060090145513783985}}},"white":{"username":"00bSlowLeela","id":640435,"rank":34.950459271425665,"professional":false,"accepted":false,"ratings":{"overall":{"rating":2600.9994296562463,"deviation":145.1053194921326,"volatility":0.060348615827214624,"games_played":11}}},"time_per_move":80}
Jul 21 16:10:57 connection.js:1391 # 18783873 active_game phase === finished
Jul 21 16:10:57 connection.js:174 Starting disconnect Timeout in Connection active_game for 18783873
Jul 21 16:10:57 game.js:490 [Game 18783873] Move 2 gamedata B 00bSlowLeela [19x19]
Jul 21 16:10:57 game.js:490 [Game 18783873] Move 2 Game over. Result: W+Time L
Jul 21 16:10:57 bot.js:214 [4378] Stopping bot
Jul 21 16:10:57 game.js:490 [Game 18783873] Move 2 clock: {"game_id":18783873,"current_player":592558,"black_player_id":592558,"white_player_id":640435,"title":"lz elf-v1 -b 1 -t 2 -v 6400","last_move":1563716457057,"expiration":1563718257057,"black_time":{"thinking_time":1800,"skip_bonus":false},"white_time":{"thinking_time":1800,"skip_bonus":false},"now":1563718257188}
Jul 21 16:10:58 connection.js:1391 # disconnectFromGame 18783873
Jul 21 16:10:58 game.js:490 [Game 18783873] Move 2 Disconnecting from game.
Jul 21 16:11:02 bot.js:214 [4378] Killing process directly with a signal
Jul 21 16:17:15 connection.js:1391 # Dumping status of all connected games
Jul 21 16:17:15 connection.js:1391 # Dump complete
It looks like a the gamedata packet arrived a second time. And Phoenix took two moves in a row (genmove black was called twice). The 2nd genmove was before white had moved, white sent to D4, then Phoenix also wanted to go to D4. The crash maybe happened because 'play white d4' is an illegal move since it thinks black already put a second stone down on D4.
I guess the question is if getting two gamedata packets in a row would cause a second incorrect genmove to trigger? It might. We used to restart the bot any time we got an unexpected genmove which would have prevented this. But that led to restart loops because of other prior changes, and it was never fully debugged so instead we stopped restarting on every gamedata packet.
Perhaps we need to record if a bot is currently thinking about a more or not, and if so, be sure we don't ask for another makeMove().
i understand the general idea :+1: so it would be a gtp2ogs configure issue then
i'll leave the rest to capable people lol thanks :1st_place_medal:
Ok I found it.
https://github.com/online-go/gtp2ogs/blob/devel/game.js#L108-L114
Change the } else {
to } else if (!this.processing) {
.
That should fix it.
no idea why it happened, but i am doing other things, however here is the log and game link
https://online-go.com/game/17708541
using this branch : https://github.com/wonderingabout/gtp2ogs/tree/windo-crashloop-import