sanderland / katrain

Improve your Baduk skills by training with KataGo!
Other
1.64k stars 226 forks source link

Runing katago engine remotedly through ssh #131

Closed siulkilulki closed 3 years ago

siulkilulki commented 4 years ago

Is it possible to run katatgo remotely and interface with katrain? (Similarly how it is done in Lizzie)

sanderland commented 4 years ago

Not without significant hacking around and using little connector scripts like this does

dinohsu1019 commented 4 years ago

I think it's better if the gtp client application can support remote engine directly. I have success with Lizzie, but unfortunately it seems Lizzie doesn't have play against AI engine or AI vs. AI yet. I am trying sabaki, but it seems this application doesn't support remote engine directly either.

sanderland commented 4 years ago

katrain uses the json analysis engine of katago. For my bots I put this on a socket so all the bots can run on 1 katago instance.

https://github.com/sanderland/katrain-bots/blob/master/engine_connector.py https://github.com/sanderland/katrain-bots/blob/master/engine_server.py

However this is not so easy to set up for most people, so I'm hesitant to go in this direction.

dinohsu1019 commented 4 years ago

However this is not so easy to set up for most people, so I'm hesitant to go in this direction.

Can you have an option to add an engine as a command line as Lizzie does? This way, I can connect to my remote computer with GPU installed.

Thanks, Dino

sanderland commented 4 years ago

This is now in v1.3.2 like this: image the "quotes" are what make it override. Good luck!

gold16 commented 4 years ago

Using user config file C:\Users\xrgol.katrain\config.json Switching language to cn The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is. The server's ssh-ed25519 key fingerprint is: ssh-ed25519 255 49:7e:7e:3f:7b:a9:13:3c:df:f5:5e:f0:59:6a:51:21 If you trust this host, enter "y" to add the key to PuTTY's cache and carry on connecting. If you want to carry on connecting just once, without adding the key to the cache, enter "n". If you do not trust this host, press Return to abandon the connection. Store key in cache? (y/n) KataGo v1.4.5 Using TrompTaylor rules initially, unless GTP/GUI overrides this Loaded config /root/config/gtp.cfg Loaded model /root/weights/40b.bin.gz Model name: g170-b40c256x2-s5095420928-d1229425124 GTP ready, beginning main protocol loop Exception in thread Thread-1: Traceback (most recent call last): File "katrain\core\engine.py", line 166, in _analysis_read_thread File "json__init__.py", line 348, in loads File "json\decoder.py", line 337, in decode File "json\decoder.py", line 355, in raw_decode json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "threading.py", line 917, in _bootstrap_inner File "threading.py", line 865, in run File "katrain\core\engine.py", line 195, in _analysis_read_thread File "c:\users\sande\anaconda3\lib\traceback.py", line 163, in print_exc File "c:\users\sande\anaconda3\lib\traceback.py", line 104, in print_exception File "c:\users\sande\anaconda3\lib\traceback.py", line 497, in init File "c:\users\sande\anaconda3\lib\traceback.py", line 508, in init File "c:\users\sande\anaconda3\lib\traceback.py", line 337, in extract TypeError: '>=' not supported between instances of 'JSONDecodeError' and 'int'

sanderland commented 4 years ago

@gold16 yes, using the gtp engine instead of the analysis engine won't work, as posted earlier.

dinohsu1019 commented 4 years ago

This is now in v1.3.2 like this: image the "quotes" are what make it override. Good luck!

Thanks for the quick action, but it seems I cannot copy and paste in that field, and my string is 129 characters long !! It's too difficult to type 129 characters.

Also, I am not sure if the double quotes "...." are required, when I type double quotes, it warns: path does not exist.

sanderland commented 4 years ago

Copy-paste works for me! And yes, it will complain about the path not existing, because it's not a path and I haven't programmed the form validation to pick up on that yet. You can ignore it.

dinohsu1019 commented 4 years ago

The copy-paste works only with ctrl-C, ctrl-V, but not ctrl-Ins, shift-Ins, nor right click copy/paste.

I try the following string, it says "engine is ready", but when I play a couple of moves, "Analyzing move..." forever and no response. (note that pp = password, uu = userid, ii = ip address)

"plink.exe -ssh -pw pp uu@ii C:/katago-v1.4.5-cuda10.2-windows-x64/katago.exe gtp -config C:/katago-v1.4.5-cuda10.2-windows-x64/gtp_example.cfg -model C:/katago-v1.4.5-cuda10.2-windows-x64/kg_b20_d1228m.gz"

Thanks again, Dino

sanderland commented 4 years ago

@dinohsu1019 you are all repeatedly using the GTP command. Only the JSON engine is used https://github.com/lightvector/KataGo/blob/master/docs/Analysis_Engine.md

dinohsu1019 commented 4 years ago

It seems you are saying instead of using the "gtp" subcommand (aka GTP protocol), I should use "analysis" subcommand (aka JSON protocol), only which is supported by katrain application. On the contrary, most applications support GTP protocol instead of JSON protocol (if there is any standard for this?) If so, I will have to figure out how to make the configurations file and paramters. it seems JSON protocol is supported by both CUDA and OPENCL version of katago.

image

sanderland commented 4 years ago

You can turn on debug_level=1 in options and restart to see the command that is used to start katago.

Starting KataGo with "C:\Users\sande\Desktop\katrain\katrain\KataGo/katago.exe" analysis -model "C:\Users\sande\.katrain\g170e-b20c256x2-s5303129600-d1228401921.bin.gz" -config "C:\Users\sande\Desktop\katrain\katrain\KataGo/analysis_config.cfg" -analysis-threads 12
gold16 commented 4 years ago

Using user config file C:\Users\xrgol.katrain\config.json Switching language to cn The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is. The server's ssh-ed25519 key fingerprint is: ssh-ed25519 255 9d:f7:7f:c0:34:76:e0:f6:e8:56:01:2d:59:f1:85:4a If you trust this host, enter "y" to add the key to PuTTY's cache and carry on connecting. If you want to carry on connecting just once, without adding the key to the cache, enter "n". If you do not trust this host, press Return to abandon the connection. Store key in cache? (y/n) 2020-07-17 09:46:20+0800: Analysis Engine starting... 2020-07-17 09:46:20+0800: KataGo v1.4.5 2020-07-17 09:46:20+0800: nnRandSeed0 = 4433454028558008541 2020-07-17 09:46:20+0800: After dedups: nnModelFile0 = E:/katago/40b.bin.gz useFP16 auto useNHWC auto 2020-07-17 09:46:22+0800: Loaded config E:/katago/analysis_config.cfg 2020-07-17 09:46:22+0800: Loaded model E:/katago/40b.bin.gz 2020-07-17 09:46:22+0800: Analyzing up to 12 positions at at time in parallel 2020-07-17 09:46:22+0800: Started, ready to begin handling requests Exception in thread Thread-1: Traceback (most recent call last): File "katrain\core\engine.py", line 167, in _analysis_read_thread KeyError: 'id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "threading.py", line 917, in _bootstrap_inner File "threading.py", line 865, in run File "katrain\core\engine.py", line 195, in _analysis_read_thread File "c:\users\sande\anaconda3\lib\traceback.py", line 163, in print_exc File "c:\users\sande\anaconda3\lib\traceback.py", line 104, in print_exception File "c:\users\sande\anaconda3\lib\traceback.py", line 508, in init File "c:\users\sande\anaconda3\lib\traceback.py", line 337, in extract TypeError: '>=' not supported between instances of 'KeyError' and 'int'

2020-07-17 09:46:22+0800: Cuda backend: Found GPU Tesla V100-SXM2-32GB memory 34228142080 compute capability major 7 minor 0 2020-07-17 09:46:22+0800: Cuda backend: Model version 8 useFP16 = true useNHWC = true 2020-07-17 09:46:22+0800: Cuda backend: Model name: g170-b40c256x2-s5095420928-d1229425124

sanderland commented 4 years ago

@gold16 hmm, that seems to indicate the program is outputting something unexpected. can you try the v1.3.3 branch?

gold16 commented 4 years ago

v1.3.3 still has errors,But after loading the game, it can be analyzed normally

(base) C:\AQ\katrain-1.3.3\katrain>katrain Using user config file C:\Users\xrgol.katrain\config.json Switching language to cn The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is. The server's ssh-ed25519 key fingerprint is: ssh-ed25519 255 f6:d3:db:ba:24:1a:02:0e:dd:6a:cb:83:7a:fa:78:cb If you trust this host, enter "y" to add the key to PuTTY's cache and carry on connecting. If you want to carry on connecting just once, without adding the key to the cache, enter "n". If you do not trust this host, press Return to abandon the connection. Store key in cache? (y/n) 2020-07-17 16:47:29+0800: Analysis Engine starting... 2020-07-17 16:47:29+0800: KataGo v1.4.5 2020-07-17 16:47:29+0800: nnRandSeed0 = 12360985904118600238 2020-07-17 16:47:29+0800: After dedups: nnModelFile0 = E:/katago/40b.bin.gz useFP16 auto useNHWC auto 2020-07-17 16:47:32+0800: Loaded config E:/katago/analysis_config.cfg 2020-07-17 16:47:32+0800: Loaded model E:/katago/40b.bin.gz 2020-07-17 16:47:32+0800: Analyzing up to 12 positions at at time in parallel 2020-07-17 16:47:32+0800: Started, ready to begin handling requests ERROR: Unexpected exception 'id' while processing KataGo output b'{"error":"[json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected \':\'; expected \'[\', \'{\', or a literal"}\r\n' Traceback (most recent call last): File "c:\users\xrgol\anaconda3\lib\site-packages\katrain\core\engine.py", line 167, in _analysis_read_thread if analysis["id"] not in self.queries: KeyError: 'id' 2020-07-17 16:47:32+0800: Cuda backend: Found GPU Tesla V100-SXM2-32GB memory 34228142080 compute capability major 7 minor 0 2020-07-17 16:47:32+0800: Cuda backend: Model version 8 useFP16 = true useNHWC = true 2020-07-17 16:47:32+0800: Cuda backend: Model name: g170-b40c256x2-s5095420928-d1229425124

sanderland commented 4 years ago

@gold16 small improvement pushed, but I expect it will still give an error on startup. I think something about the connector program is giving katago some unexpected input at startup.

gold16 commented 4 years ago

Still the same error

Using user config file C:\Users\xrgol.katrain\config.json Switching language to cn The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is. The server's ssh-ed25519 key fingerprint is: ssh-ed25519 255 a6:60:20:43:67:b7:9c:e4:5a:8f:d5:85:3b:e0:ec:ed If you trust this host, enter "y" to add the key to PuTTY's cache and carry on connecting. If you want to carry on connecting just once, without adding the key to the cache, enter "n". If you do not trust this host, press Return to abandon the connection. Store key in cache? (y/n) 2020-07-17 19:15:57+0800: Analysis Engine starting... 2020-07-17 19:15:57+0800: KataGo v1.4.5 2020-07-17 19:15:57+0800: nnRandSeed0 = 14414390552940434678 2020-07-17 19:15:57+0800: After dedups: nnModelFile0 = E:/katago/40b.bin.gz useFP16 auto useNHWC auto 2020-07-17 19:16:00+0800: Loaded config E:/katago/analysis_config.cfg 2020-07-17 19:16:00+0800: Loaded model E:/katago/40b.bin.gz 2020-07-17 19:16:00+0800: Analyzing up to 12 positions at at time in parallel 2020-07-17 19:16:00+0800: Started, ready to begin handling requests ERROR: Unexpected exception 'id' while processing KataGo output b'{"error":"[json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected \':\'; expected \'[\', \'{\', or a literal"}\r\n' Traceback (most recent call last): File "c:\users\xrgol\anaconda3\lib\site-packages\katrain\core\engine.py", line 167, in _analysis_read_thread if analysis["id"] not in self.queries: KeyError: 'id' 2020-07-17 19:16:00+0800: Cuda backend: Found GPU Tesla V100-SXM2-32GB memory 34228142080 compute capability major 7 minor 0 2020-07-17 19:16:00+0800: Cuda backend: Model version 8 useFP16 = true useNHWC = true 2020-07-17 19:16:00+0800: Cuda backend: Model name: g170-b40c256x2-s5095420928-d1229425124

sanderland commented 4 years ago

@gold16 strange, are you sure you pulled in the latest commit?

gold16 commented 4 years ago

Make sure it is the latest commit I used three remote servers to test the same error

sanderland commented 4 years ago

File "c:\users\xrgol\anaconda3\lib\site-packages\katrain\core\engine.py", line 167, in _analysis_read_thread Compare https://github.com/sanderland/katrain/pull/142/files , line 167 is different, you definitely have out of date files or did not pip install . after a pull.

gold16 commented 4 years ago

I did pip3 install . again. Present error: ERROR: Error without ID {'error': "[json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected ':'; expected '[', '{', or a literal"} received from KataGo

sanderland commented 4 years ago

@gold16 alright, that's expected given that's what katago is returning. I'm not sure why though, but it seems not critical if everything else is working.

sanderland commented 4 years ago

1.3.5 adds a separate option for overriding the command. This should make things a lot clearer and resolve issues with quotes. @rexl2018 / others, please give it a try. :)

gold16 commented 4 years ago

1.3.5 添加了用于重写命令的单独选项。这应该使事情更清楚,并解决报价的问题。 @rexl2018 / 其他人, 请试一试。:)

C:\AQ\katrain-1.3.5>katrain Using user config file C:\Users\xrgol.katrain\config.json Switching language to cn The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is. The server's ssh-ed25519 key fingerprint is: ssh-ed25519 255 6c:97:04:87:13:5a:f5:1c:dd:e4:f2:5d:63:34:42:98 If you trust this host, enter "y" to add the key to PuTTY's cache and carry on connecting. If you want to carry on connecting just once, without adding the key to the cache, enter "n". If you do not trust this host, press Return to abandon the connection. Store key in cache? (y/n) 2020-08-22 12:24:31+0800: Analysis Engine starting... 2020-08-22 12:24:31+0800: KataGo v1.5.0 2020-08-22 12:24:31+0800: nnRandSeed0 = 5483930942631159069 2020-08-22 12:24:31+0800: After dedups: nnModelFile0 = /home/mist/weights/40b.bin.gz useFP16 auto useNHWC auto 2020-08-22 12:24:33+0800: Cuda backend thread 0: Found GPU Tesla V100-SXM2-16GB memory 16945512448 compute capability major 7 minor 0 2020-08-22 12:24:33+0800: Cuda backend thread 0: Model version 8 useFP16 = true useNHWC = true 2020-08-22 12:24:33+0800: Cuda backend thread 0: Model name: g170-b40c256x2-s5095420928-d1229425124 2020-08-22 12:24:37+0800: Loaded config /home/mist/config/analysis_config.cfg 2020-08-22 12:24:37+0800: Loaded model /home/mist/weights/40b.bin.gz 2020-08-22 12:24:37+0800: Analyzing up to 12 positions at at time in parallel 2020-08-22 12:24:37+0800: Started, ready to begin handling requests ERROR: Error without ID {'error': "[json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected ':'; expected '[', '{', or a literal"} received from KataGo

sanderland commented 4 years ago

@gold16 This is an error katago is sending due to some unexpected input it gets from plink. I can't help that. The program runs without problems otherwise?

gold16 commented 4 years ago

Can be used normally

sanderland commented 4 years ago

@gold16 could you try katago v1.6 ? it should give clearer errors

gold16 commented 4 years ago

@gold16 could you try katago v1.6 ? it should give clearer errors

C:\AQ\katrain-1.3.5>katrain Using user config file C:\Users\xrgol.katrain\config.json Switching language to cn The server's host key is not cached in the registry. You have no guarantee that the server is the computer you think it is. The server's ssh-ed25519 key fingerprint is: ssh-ed25519 255 6c:97:04:87:13:5a:f5:1c:dd:e4:f2:5d:63:34:42:98 If you trust this host, enter "y" to add the key to PuTTY's cache and carry on connecting. If you want to carry on connecting just once, without adding the key to the cache, enter "n". If you do not trust this host, press Return to abandon the connection. Store key in cache? (y/n) 2020-08-25 10:33:24+0800: Analysis Engine starting... 2020-08-25 10:33:24+0800: KataGo v1.6.0 2020-08-25 10:33:24+0800: nnRandSeed0 = 710620353520443773 2020-08-25 10:33:24+0800: After dedups: nnModelFile0 = /home/mist/weights/40b.bin.gz useFP16 auto useNHWC auto 2020-08-25 10:33:26+0800: Cuda backend thread 0: Found GPU Tesla V100-SXM2-16GB memory 16945512448 compute capability major 7 minor 0 2020-08-25 10:33:26+0800: Cuda backend thread 0: Model version 8 useFP16 = true useNHWC = true 2020-08-25 10:33:26+0800: Cuda backend thread 0: Model name: g170-b40c256x2-s5095420928-d1229425124 2020-08-25 10:33:29+0800: Loaded config /home/mist/config/analysis_config.cfg 2020-08-25 10:33:29+0800: Loaded model /home/mist/weights/40b.bin.gz 2020-08-25 10:33:29+0800: Analyzing up to 12 positions at at time in parallel ERROR: Error without ID {'error': '[json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected \':\'; expected \'[\', \'{\', or a literal - could not parse input line as json request: : -999999, "analyzeTurns": [0], "maxVisits": 5000, "komi": 7.5, "boardXSize": 19, "boardYSize": 19, "includeOwnership": true, "includePolicy": true, "initialStones": [], "moves": [], "overrideSettings": {"reportAnalysisWinratesAs": "BLACK", "maxTime": 3.0}, "id": "QUERY:1"}'} received from KataGo 2020-08-25 10:33:29+0800: Started, ready to begin handling requests

sanderland commented 4 years ago

thanks! it seems like the first 32 characters of input are discarded somehow, still not clear why.

Blueve commented 4 years ago

Just a suggestion,

Is possible to provide a katrain-server? I notice that katrain is not only use KataGo but also utilize other Go AI engines for different game mode, if we have a katrain-server then we might able to use all features instead of analysis engine only.

The katrain-serer will enable us:

sanderland commented 4 years ago

@Blueve are you willing to pay for cloud servers?

Blueve commented 4 years ago

@sanderland might not... gpu instance are too expensive. But I am willing to setup the server in my workstation then I can play katrain everywhere with my macbook(intel-gpu) : P

dinohsu1019 commented 3 years ago

You can turn on debug_level=1 in options and restart to see the command that is used to start katago.

Starting KataGo with "C:\Users\sande\Desktop\katrain\katrain\KataGo/katago.exe" analysis -model "C:\Users\sande\.katrain\g170e-b20c256x2-s5303129600-d1228401921.bin.gz" -config "C:\Users\sande\Desktop\katrain\katrain\KataGo/analysis_config.cfg" -analysis-threads 12

@sanderland: I cannot find this option: -analysis-threads 12, isn't this configured in the config file?

dinohsu1019 commented 3 years ago

Copy-paste works for me! And yes, it will complain about the path not existing, because it's not a path and I haven't programmed the form validation to pick up on that yet. You can ignore it.

@sanderland How do I save this command? There's no OK button, and when I close Katrain v1.7 and open again, the "Path to katago executable" field is blank again. Also, how do know if Katrain is analysing with the remote engine?

image

sanderland commented 3 years ago

@dinohsu1019 in recent versions I've made this option clearer and you should use this, without quotes image If you have debug on (=1) and are running in a console you should be able to see the command it uses to create a katago

dinohsu1019 commented 3 years ago

@dinohsu1019 in recent versions I've made this option clearer and you should use this, without quotes image If you have debug on (=1) and are running in a console you should be able to see the command it uses to create a katago

@sanderland

  1. I do not find any debug option, can you elaborate on the debug on (=1) what does it mean? It seems the remote engine running because I do not hear the sound of the PC with GPU.

  2. The update button is not visible: (I guess then no way to save engine settings)

image

sanderland commented 3 years ago

image The error message is a mistake and I have fixed it in 5974bfb. You can ignore it. The button is there, maybe your window is too small? I will limit the height to the window height.

dinohsu1019 commented 3 years ago

@sanderland the button (and heading) is hidden even when I set the highest resolution (1366x768), I also tried lower resolutions, but more parts are hidden with those. So I guess your display resolution is higher than 1366x768 and it is displayed in absolute pixels.

dinohsu1019 commented 3 years ago

@sanderland the button (and heading) is hidden even when I set the highest resolution (1366x768), I also tried lower resolutions, but more parts are hidden with those. So I guess your display resolution is higher than 1366x768 and it is displayed in absolute pixels.

@sanderland May I know when the next release will be delivered? Thanks again.

sanderland commented 3 years ago

I'll build 1.7.1 when katago 1.8 is released, probably later this week.

dinohsu1019 commented 3 years ago

I'll build 1.7.1 when katago 1.8 is released, probably later this week.

Thanks a lot, it works now. Fun application, I hope Ah-Q Pro (Android) can take away some of your features such as fast analysis.

dinohsu1019 commented 3 years ago

@sanderland I have a doubt about when I use remote engine with override engine command, does the application only use the remote one for analysis or also for playing? How about teaching games?

sanderland commented 3 years ago

Yes, everything