Visit count resets to zero when switching windows

isty2e commented 1 year ago

While analyzing, if LizGoban is sent to background and brought back to foreground, the visit count resets to zero and it starts to reconstruct the analysis from the cache. Did something change regarding thread management? This behavior was not present in previous versions.

I have no idea what else information I can provide to address the issue, so tell me if you have something in mind.

kaorahi commented 1 year ago

Thanks for the report. Unfortunately, I can neither reproduce the issue nor recall any intentional change related to it at present...

I'd like to confirm the meaning of "background" first. Would you let me know your OS and the specific operations on the keyboard/mouse for "sent to background"?

Regardless, I doubt that katago process is killed for some reason when you send lizgoban to background. When katago is down, lizgoban automatically restarts katago and uses the cached analysis until the visits of the new katago exceeds the cached one. You may be able to observe the change of the process ID of katago, I guess.

If you changed nothing except for the version of lizgoban, the new version may consume more resources and that tends to make the katago process prone to being forcibly stopped? Or, something is changed in a recent version of Electron?

It may be helpful if you tell me which version (commit) is ok and which is bad. "Bisect" is ideal as I keep each commit small in lizgoban, but I'm afraid it may be too much of a burden for you...

isty2e commented 1 year ago

"Sending it to background" - Sorry for the confusion. To be precise, this happens when I click another application window (like that of a web browser) and "activate" the Electron app window by clicking any of LizGoban UI or component. The operating system is MacOS Ventura 13.4.
No, I don't think the katago process is being killed or terminated here: The PID remains the same, and this happens instantaneously, that is, which must be a much shorter period of time than LizGoban can possibly load KataGo again.
I don't think it is somehow resource related neither; It happens even after all katago processes are terminated and LizGoban is restarted.

I will try to troubleshoot further and at which commit the problem begins to appear. Maybe it's just that my system is screwed...

isty2e commented 1 year ago

First of all, what I recognized while testing is that it does not actually reset everything; The visit count does not start from zero (like 0/10,000), but instead from a certain fraction of the number (like 1,500/10,000).

And I think it is this commit: https://github.com/kaorahi/lizgoban/commit/41fb586f5a4f8c2611f4a6d4d5a5c21883792402 Before that, LizGoban does not clear the board attempt to restore the analysis from the cache, in general.

The issue is present for all types of games, not limited to handicap games (which in fact I did not even analyze)
I can try modifying L234 to const updated_p = update_board_by_undo(history, beg) and test when time permits

kaorahi commented 1 year ago

I feel we are approaching to the key point. That commit is one of major internal changes related to set_board in update_all, that can be related to the switch of the active window. The present issue seems explainable if lizgoban sends clear_board (or undo) and play ... to katago to rebuild the current board position needlessly. Could you find such unnecessary GTP commands by npm start | grep 'engine>' and [Debug] > [Debug log] in the menu?

In engine.js, I expect:

beg_valid_p is truthy for non-handicap games.
update_board_by_undo does nothing for unchanged board positions.

Either may be wrong. :(

isty2e commented 1 year ago

I just tested const updated_p = update_board_by_undo(history, beg) and it works as I intended.

Here is the log:

❯ npm start -- -c config.json | grep 'engine>'
(24.42) [79007] engine> 148 clear_board []
(24.55) [79007] engine> 149 name [*lizgoban_kata-raw-nn PDA=2]
(24.55) [79007] engine> 150 name []
(24.55) [79007] engine> 151 kata-set-param playoutDoublingAdvantage 2 []
(24.55) [79007] engine> 152 kata-raw-nn 0 []
(24.57) [79007] engine> 153 kata-set-param playoutDoublingAdvantage 0 []
(24.57) [79007] engine> 154 name []
(24.57) [79007] engine> 155 kata-set-param playoutDoublingAdvantage -2 []
(24.57) [79007] engine> 156 kata-raw-nn 0 []
(24.58) [79007] engine> 157 kata-set-param playoutDoublingAdvantage 0 []
(24.58) [79007] engine> 158 kata-analyze b interval 10 ownership true ownershipStdev true minmoves 30 pvVisits true pvEdgeVisits true movesOwnership true []
(28.17) [79007] engine> 159 clear_board []
(28.31) [79007] engine> 160 name [*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},clear_board,*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 161 name [*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},clear_board,*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 162 name [*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},clear_board,*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 163 name [*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_set {"move_count":0,"bturn":true},clear_board,*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 164 name [*lizgoban_set {"move_count":0,"bturn":true},clear_board,*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 165 name [clear_board,*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 166 clear_board [*lizgoban_set {"move_count":0,"bturn":true},*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 167 name [*lizgoban_kata-raw-nn PDA=2]
(28.31) [79007] engine> 168 name []
(28.31) [79007] engine> 169 kata-set-param playoutDoublingAdvantage 2 []
(28.31) [79007] engine> 170 kata-raw-nn 0 []
(28.33) [79007] engine> 171 kata-set-param playoutDoublingAdvantage 0 []
(28.33) [79007] engine> 172 name []
(28.33) [79007] engine> 173 kata-set-param playoutDoublingAdvantage -2 []
(28.33) [79007] engine> 174 kata-raw-nn 0 []
(28.34) [79007] engine> 175 kata-set-param playoutDoublingAdvantage 0 []
(28.34) [79007] engine> 176 kata-analyze b interval 10 ownership true ownershipStdev true minmoves 30 pvVisits true pvEdgeVisits true movesOwnership true []
**(34.87) [79007] engine> 177 clear_board []**
(34.93) [79007] engine> 178 name [*lizgoban_kata-raw-nn PDA=2]
(34.93) [79007] engine> 179 name []
(34.93) [79007] engine> 180 kata-set-param playoutDoublingAdvantage 2 []
(34.93) [79007] engine> 181 kata-raw-nn 0 []
(34.95) [79007] engine> 182 kata-set-param playoutDoublingAdvantage 0 []
(34.95) [79007] engine> 183 name []
(34.95) [79007] engine> 184 kata-set-param playoutDoublingAdvantage -2 []
(34.95) [79007] engine> 185 kata-raw-nn 0 []
(34.96) [79007] engine> 186 kata-set-param playoutDoublingAdvantage 0 []
(34.96) [79007] engine> 187 kata-analyze b interval 10 ownership true ownershipStdev true minmoves 30 pvVisits true pvEdgeVisits true movesOwnership true []
(35.59) [79007] engine> 188 name []
(35.64) [79007] engine> 189 kata-set-param playoutDoublingAdvantage 2 [name]
(35.64) [79007] engine> 190 kata-raw-nn 0 [name]
(35.66) [79007] engine> 191 name []

Apparently, at 34.87 (which is when I re-activated the window) clear_board is executed. One dubious thing is that it tries to set PDA to nonzero values; Perhaps something has gone wrong with aux.handicaps, it misjudges that this is a handicapped game, and then aux.handicaps != handicaps so beg_valid_p is set to false?

kaorahi commented 1 year ago

No. kata-set-param playoutDoublingAdvantage is intentionally called regardless of handicap stones from 83d6b3ab6 in 2022 (v0.7.0-pre3).

cf. help.html

Small blue up/down triangles on suggestions indicate the most preferred moves by "AIs for handicap games" as the stronger/weaker side in the first impression. "Moves like the stronger side" are also shown by sharper triangles in the winrate bar. (They are disabled on slow machines.)

Is the board empty in your previous comment?
Does the issue occur for non-empty boards? If this is yes, can you confirm which condition makes beg_valid_p falsy, aux.handicaps === handicaps, aux.init_len === init_len, or beg.length >= init_len?
Does the following patch solve the issue for the empty board?

--- a/src/engine.js
+++ b/src/engine.js
@@ -231,7 +231,7 @@ function create_leelaz () {
         update_kata(gorule, aux.gorule, 'kata-set-rules', z => {gorule = z})
         ownership_p = update_kata(ownership_p, aux.ownership_p)
         aggressive = update_kata(aggressive, kata_pda_supported() ? aux.aggressive : '')
-        if (empty(history)) {clear_leelaz_board(); update_move_count([], true); return}
+        if (empty(history)) {!empty(leelaz_previous_history) && clear_leelaz_board(); update_move_count([], true); return}
         const beg = common_header_length(history, leelaz_previous_history)
         const beg_valid_p = aux.handicaps === handicaps && aux.init_len === init_len &&
               beg.length >= init_len

isty2e commented 1 year ago

I don't quite remember, but this time I'm testing with a non-empty board.

Tested with following code inserted:

debug_log(`handicap identity: ${aux.handicaps === handicaps}`)
debug_log(`init_len identity: ${aux.init_len === init_len}`)
debug_log(`beg.length comparison: ${beg.length >= init_len}`)

log:

(38.06) API update_menu []
(38.06) update_all start (keep_board = false)
(38.06) handicap identity: true
(38.06) init_len identity: true
(38.06) beg.length comparison: false
(38.06) [89195] queue> clear_board []
(38.06) [89195] engine> 1109 clear_board []

Yes, but not for a non-empty board.

kaorahi commented 1 year ago

Ah, beg = common_header_length(...), so beg.length is "length of length"...

How about this?

--- a/src/engine.js
+++ b/src/engine.js
@@ -231,10 +231,10 @@ function create_leelaz () {
         update_kata(gorule, aux.gorule, 'kata-set-rules', z => {gorule = z})
         ownership_p = update_kata(ownership_p, aux.ownership_p)
         aggressive = update_kata(aggressive, kata_pda_supported() ? aux.aggressive : '')
-        if (empty(history)) {clear_leelaz_board(); update_move_count([], true); return}
+        if (empty(history)) {!empty(leelaz_previous_history) && clear_leelaz_board(); update_move_count([], true); return}
         const beg = common_header_length(history, leelaz_previous_history)
         const beg_valid_p = aux.handicaps === handicaps && aux.init_len === init_len &&
-              beg.length >= init_len
+              beg >= init_len
         handicaps = aux.handicaps; init_len = aux.init_len
         const updated_p = beg_valid_p ? update_board_by_undo(history, beg) :
               update_board_by_clear(history, handicaps, init_len)

isty2e commented 1 year ago

Problem solved. The issue is gone.

kaorahi commented 1 year ago

Thanks for your valuable help!

kaorahi / lizgoban

Visit count resets to zero when switching windows #88