compiling on Windows - Githubissues

alreadydone commented 5 years ago

Will update Windows build here.

Complete package: https://drive.google.com/file/d/1bdIlVDJ3x6FZtX5fmuG6wNbb57GFU8S0/view (6/27/2019, use new cudnn DLL)

Original content:

I tried to compile the GTP engine on Windows but failed:

>------ 生成 已启动: 项目: CMakeLists，配置: RelWithDebInfo ------
  [1/53] "e:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX64\x64\cl.exe"  /nologo /TP -DUSE_CUDA_BACKEND -IC:\KataGo\cpp\external -IC:\KataGo\cpp\external\tclap-1.2.1\include -IE:\zlib\include -IE:\CUDA\include /DWIN32 /D_WINDOWS /W3 /GR /EHsc /MD /Zi /O2 /Ob1 /DNDEBUG   -std:c++14 /showIncludes /FoCMakeFiles\main.dir\core\elo.cpp.obj /FdCMakeFiles\main.dir\ /FS -c C:\KataGo\cpp\core\elo.cpp
  FAILED: CMakeFiles/main.dir/core/elo.cpp.obj 
  "e:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.14.26428\bin\HostX64\x64\cl.exe"  /nologo /TP -DUSE_CUDA_BACKEND -IC:\KataGo\cpp\external -IC:\KataGo\cpp\external\tclap-1.2.1\include -IE:\zlib\include -IE:\CUDA\include /DWIN32 /D_WINDOWS /W3 /GR /EHsc /MD /Zi /O2 /Ob1 /DNDEBUG   -std:c++14 /showIncludes /FoCMakeFiles\main.dir\core\elo.cpp.obj /FdCMakeFiles\main.dir\ /FS -c C:\KataGo\cpp\core\elo.cpp
c:\katago\cpp\core\global.h(32): error C3646: “__attribute__”: 未知重写说明符
...

When I run code analysis, I was told C:/KataGo/cpp/core/global.cpp(14): fatal error C1083: 无法打开包括文件: “dirent.h”: No such file or directory and when I looked into core/global.cpp, I found #include <dirent.h> //TODO this is not portable to windows, use C++17 filesystem library when C++17 is available

Is this the only obstruction to porting to Windows? (Seems it can work under Windows though: https://github.com/tronkko/dirent)

I experienced some problem with the Git portion in CMakeLists.txt, so I removed it (not sure about what it is for), but some source files require program/gitinfo.h; would it work if I rename gitinfotemplate.h to gitinfo.h?

lightvector commented 5 years ago

Possibly. I don't develop on windows, so have not attempted to compile on windows, but it's plausible that this might be the only obstruction. I've attempted to write all my code to in-theory generalize between windows/linux, but this was one of the points where I skipped that. I'd be happy to work on a fix for that in the next few days, if you like.

Renaming gitinfotemplate.h to gitinfo.h would "work" but isn't ideal. The git portion in CMakeLists.txt is supposed to regenerate gitinfo.h based on gitinfotemplate.h to contain a #define for the current git revision hash of the repo whenever the git revision hash changes. This #define is used in a few places so that the executable can be queried for what git revision it was compiled from, as well as being automatically output in various log files so that when you run self-play or other processes there is a record of what version of the code was used to produce that data. If you hack it to use the template directly, then you won't get this behavior.

alreadydone commented 5 years ago

Making progress:

removed __attribute__ ((noreturn)) and __attribute__ ((pure)) (I guess they're mainly for compiler optimization)
#define BYTE_ORDER LITTLE_ENDIAN for Windows
#include <algorithm> when using std::max
Now get an error at https://github.com/lightvector/KataGo/blob/e7fae0ba8ca84f1185ab9bc288bd840d6754d5a1/cpp/game/board.cpp#L228-L233 from when does C++ support runtime-sized arrays, and what compiler accepts it? I'm surprised that some people compiled successfully (even for Linux)...

lightvector commented 5 years ago

Oh, that makes things more interesting. Yeah, looking online this appears to be a feature supported by g++ (including mingw and/or cygwin toolchains for Windows) and is also part of the C99 standard, but is not part of the standard in C++, so I'm guessing most of the compilers other than g++ don't support it.

Unfortunately, there are probably a nontrivial number of locations that do this, as it's incredibly convenient for tiny arrays that need to live only for the duration of a function without paying the cost of dynamic allocation or the mess of passing in buffers as arguments, and I've never really thought about not doing it when that was what the situation needed. Are there any simple alternatives?

alreadydone commented 5 years ago

Probably no good alternatives... Looks like variable sized arrays are proposed to be added into C++ standard but retracted or rejected for some reason. It's amazing that g++ allocate them on the stack. I replaced them with new and delete since I think those perform better than std::vector.

I am not familiar with mingw or cygwin. Are binaries compiled with their g++ easy to use? i.e. I'd just distribute the .exe (not sure about license) and with the dlls (cuda, cudnn, etc.) it would then work on any Windows machine (with cuda 10 installed)? Or the user need to have mingw/cygwin installed?

I am doing this because your engine includes many long-desired feature and deserves to be widely used.

lightvector commented 5 years ago

For Cygwin, users will either need to have Cygwin installed, or else the executable needs to come bundled with the appropriate Cygwin dll files that it will depend on, but for MinGW, it compiles to a native windows binary.

So it sounds like if you're able to try MinGW, that's a promising option, and could be less work than getting the code to compile with MSVC. I've used MinGW before as many years ago I did some C++ coding on windows. Let me know if I can help out. I could also investigate the feasibility of getting rid of variable length stack allocations in the code, but that may be some work.

alreadydone commented 5 years ago

Thanks! Compiled successfully in VS finally (since CUDA can't work without VS, and I am not sure (if possible) how to treat it separately). When I launch main.exe it displays possible arguments. But main.exe gtp or with many other arguments crashes immediately. When I debug with main.exe gtp I found ~~Any ideas? I don't think any changes I made could corrupt the stack ...~~

If you'd like to look at it, you can see what are changed at https://github.com/lightvector/KataGo/compare/master...alreadydone:master (In hindsight I really should change all runtime-sized arrays to vectors...)

l1t1 commented 5 years ago

if g++ is ok, try compile with https://nuwen.net/mingw.html

l1t1 commented 5 years ago

this is a bit older https://sourceforge.net/projects/mingw-w64/files/mingw-w64/

lightvector commented 5 years ago

@alreadydone - I looked over your code and didn't see any obvious problems. Do you have the callstack for that error?

Perhaps more illustrative, you could try running main.exe runtests to run a bunch of tests of the low-level components of the code, and see if you fail in any of them, rather than going straight to the full gtp engine. Also main.exe runoutputtests runs some somewhat higher-level end-to-end stuff and dumps a big pile of output that should exactly equal the contents of tests/results/runOutputTests.txt (what's intended to be tested is that its output should equal the contents of this file, and of course that it doesn't hit any asserts or exceptions).

lightvector commented 5 years ago

@alreadydone - I just pushed a branch "pedantic": https://github.com/lightvector/KataGo/tree/pedantic

This makes the code compile under g++ using the "-pedantic" flag, which marks things like variable length arrays and other non C++ standard things. It also does so in a way more likely to preserve good performance (many of the locations in the board.cpp class that you found were fairly inner-loopish, changing them to use new/delete could be very harmful for performance).

All tests for me pass with this code. You'll have to redo your changes involving __attribute__ ((noreturn)) and the Git revision logging and a few of the defines you had to add, but otherwise, let me know if this is working for you or if you still run into stack corruption.

alreadydone commented 5 years ago

Thanks for the work! I'll try that branch. The crash I got seems to be at very early stage: call stack Could it be endianness?

lightvector commented 5 years ago

Yeah, there's a chance it's something like that. The SHA2 implementation is not my own implementation, but an open source one I found online.

If you still crash around that place, reply back. I'll still be here to chat and help figure things out. :)

l1t1 commented 5 years ago

can you upload the windows binary ?

intenseG commented 5 years ago

I tried to compile a project that @alreadydone forked on Windows but it failed.

  Building Custom Rule C:/Users/inten/Desktop/BSK/KataGo/cpp/CMakeLists.txt
  CMake does not need to re-run because C:/Users/inten/Desktop/BSK/KataGo/cpp/CMakeFiles/generate.stamp is up-to-date.
  Microsoft(R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x64
  Copyright (C) Microsoft Corporation.  All rights reserved.

  cl /c /IC:\Users\inten\Desktop\BSK\KataGo\cpp\external /I"C:\Users\inten\Desktop\BSK\KataGo\cpp\external\tclap-1.2.1\include" /I"C:\Users\inten\Desktop\sai-sai-0.15\msvc\packages\boost.1.68.0.0\lib\native\include" /I"C:\Users\inten\Desktop\sai-sai-0.15\msvc\packages\zlib-msvc14-x64.1.2.11.7795\build\native\include" /I"C:\Users\inten\Desktop\sai-sai-0.15\msvc\packages\libzip.1.1.2.7\build\native\include" /I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include" /Zi /W3 /WX- /diagnostics:classic /Od /Ob0 /D WIN32 /D _WINDOWS /D USE_CUDA_BACKEND /D "CMAKE_INTDIR=\"Debug\"" /D _MBCS /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /Zc:inline /GR /std:c++14 /Fo"main.dir\Debug\\" /Fd"main.dir\Debug\vc141.pdb" /Gd /TP /FC /errorReport:prompt C:\Users\inten\Desktop\BSK\KataGo\cpp\core\global.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\config_parser.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\elo.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\fancymath.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\hash.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\logger.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\makedir.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\md5.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\rand.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\sha2.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\core\timer.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\game\board.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\game\rules.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\game\boardhistory.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\dataio\sgf.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\dataio\numpywrite.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\dataio\trainingwrite.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\dataio\loadmodel.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\dataio\lzparse.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\neuralnet\nninputs.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\neuralnet\modelversion.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\neuralnet\nneval.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\neuralnet\cudabackend.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\search\timecontrols.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\search\searchparams.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\search\mutexpool.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\search\search.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\search\asyncbot.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\search\distributiontable.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\program\setup.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\program\play.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testboardarea.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testboardbasic.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testrules.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testscore.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testnninputs.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testsearch.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testtime.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\tests\testtrainingwrite.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\evalsgf.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\gatekeeper.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\gtp.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\match.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\matchauto.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\selfplay.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\misc.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\runtests.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\lzcost.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\sandbox.cpp C:\Users\inten\Desktop\BSK\KataGo\cpp\main.cpp
エラー LNK1120 38 件の未解決の外部参照   main    C:\Users\inten\Desktop\BSK\KataGo\cpp\Debug\main.exe    1

An unresolved external referencing error appears when generating main.exe.

I was searching for a long time, but the error was not resolved. Is there a way to solve this error?

Reference :
- https://docs.microsoft.com/ja-jp/cpp/error-messages/tool-errors/linker-tools-error-lnk2019?view=vs-2017
- https://stackoverflow.com/questions/33097558/cmake-cuda-libraries-not-found-when-compiling-opencv

alreadydone commented 5 years ago

In the readme it's said CUDA 10.0 is required, so maybe CUDA 9.0 may not be good. Also have you installed cuDNN? https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-windows BTW I tried @lightvector's pedantic branch but couldn't get rid of the stack corruption. I followed execution step by step till the error popped up but couldn't identify any out-of-bound writes or reads.

intenseG commented 5 years ago

I may be having a CUDA version issue as @alreadydone says an external reference error occurred from cudabackend.obj. I will try to create an instance for GCP (not Author of leela-zero project). Thank you!

lightvector commented 5 years ago

@alreadydone - Sounds like I should try at some point to compile on windows myself to track down where the stack corruption you're seeing might be from. Given that the pedantic branch didn't work for you, that means it was not related to any of the variable length array fixes you had to do for the stack, which means it's... something else? Interesting. I'll see if I can investigate in the next few days. @intenseG - CUDA 10.0 is the version I've been using that I know the code works with. There's some chance that CUDA 9.0 works too, but I haven't tested it, so better use CUDA 10.0 if you can.

lightvector commented 5 years ago

@alreadydone - I found a long-standing bug in sha2.cpp that would indeed cause stack corruption. Presumably the way g++ lays out the stack purely by chance it never caused any problems, so I hadn't noticed it, but it is now fixed! I also verified that a better address sanitizer than the one I had tested with before does flag the problem, and doesn't flag any other problems as far as I could test. Unfortunately, it was not compatible with CUDA, so I was not able to do a full test up to the point of a real search, but test searches without a neural net now run cleanly with the stricter address checking as well.

I also went ahead and cleaned up a few more things along the lines of the changes you had to make to get it to compile under Visual Studio, although not all of them. In particular, I just went ahead and made \<algorithm> a global include, and also added some preprocessor checking to screen out "NORETURN" and "PURE" if not on g++.

I spent some time spinning up a windows machine on AWS with a GPU (since I don't have any here) and tried to actually compile myself, but having never used Visual Studio before, I couldn't get past the phase where I get all the libraries installed in ways that Visual Studio is happy with and is able to find in order to compile the project. So this is untested on Windows but let me know if the new fix makes it work for you. I updated both 'master' and 'pedantic'.

alreadydone commented 5 years ago

@lightvector Good catch! not sure how I missed the stupid *buffer = (char)0;. This is present also in https://github.com/HowardHinnant/hash_append/blob/master/sha2.c for example, and it looks like that buffer = (char*)0; is intended. Stack corruption indeed goes away after the fix.

Thanks for the efforts to make compilation with VS easier. I might look into creating a VS project (to manage packages conveniently through NuGet) instead of using CMake (which I don't know how to direct it to find packages on Windows), though I haven't created one from scratch before.

alreadydone commented 5 years ago

I successfully compiled and ran several genmove and things seem to be working properly. I made an archive https://userscloud.com/vtfosn1sgqqe with the binary and the released 15b net, along with some notes on the configs, mostly comparisons with similar parameters in LZ. I changed chosenMoveTemperature(Early) to 0 for maximal strength, and I am tempted to change numNNServerThreadsPerModel to 2 because I think one server thread won't be able to saturate a GPU, but I abandoned the idea when I saw that memory usage doubled. The up-to-date branch that compiles with Visual Studio (when include/library directories are properly set in CMakeLists.txt) is at https://github.com/alreadydone/KataGo/tree/vs. Strangely, zlib (zstr) won't read a .gz file properly for me. For the 15b net, Error parsing model file will be thrown. When I examine the stream in, I see it terminates with -0.06479036 -0.02955167 0.02968, the 86th weight in conv1. Reading decompressed .txt file works without trouble.

alreadydone commented 5 years ago

https://userscloud.com/uug7j4awz0w7 Updated my readme.txt a bit and removed the restriction to board sizes 9-19. The engine plays reasonably on a 4x4 board, but on 21x21 it hangs on genmove. Does it take the engine equal time to evaluate a 9x9 position and a 19x19 position, or is the former faster?

I see that board sizes beyond 19 are impossible:

namespace NNPos {
  //Currently, neural net policy output can handle a max of 19x19 boards.
  const int MAX_BOARD_LEN = 19;

lightvector commented 5 years ago

There's an excellent chance if you change that number to be larger along with the number in game/board.h (search for "19" in that file as well) then it will work. Try it!

Although note that the tests in "main.exe runtests" will not pass since it will be generating zobrist hashes for a larger maximum board size, so all the hash values of all the board states will change.

Edit: Tested bumping it up to 23, it looks like at least a short search from the empty position works fine. Also, I'm not sure why zlib isn't working for you, maybe something about how zstr works with the Windows version of the library isn't good or something. Not exactly sure how to go about debugging that.

Edit#2 - By the way, in case you're curious, in the past with a much older neural net version (trained on LZ data) I tried measuring the effect of temperature for making moves. Recalling from memory without digging deep into my notes, the effect was a loss of the order of 20 to 40 Elo when using 400 or 800 visits for temperatures as high as using 0.3 for the whole game. So actually surprisingly little in the grand scheme of things for such a noticeable temperature.

My intuition is that temperatures of much less than that should have very little harm on the strength, particularly very early in the game. For example, at 0.1, it will be happy to choose between two moves that are very close in visits randomly, but if one of them even gets, say, a 30% advantage in visits over the other, that translates to only a 7% chance that it will play the lesser-visited one, and by the time you get to double the visits of the lesser one, that's only a 1:1000 chance of playing the lesser one. This allows for a bit more opening diversity between nearly equal moves beyond just neural net symmetry randomization, but only when two moves are very close.

alreadydone commented 5 years ago

Thanks for the info. On my notebook (940M) with batchsize 16 (default) I got ~~486 batches in 23.75s on 19x19, 15.29 b/s 856 batches in 55.98s on 8x8, 20.46 b/s so it seems inference is faster on small boards, but the gain is small. Should I increase batch size as board size decreases?~~ Now that I look at more log entries it seems it's also around 20 b/s on 19x19. Do I need to adjust NNPos::MAX_BOARD_LEN to speed up inference on small boards? That would require compiling one binary for each board size.

BTW, Are there ways to query/interact with the engine to get more search/eval info other than looking into gtp.log?

alreadydone commented 5 years ago

I compiled a board size 37 version and the row/column numbering disappears when calling showboard. Moreover genmove b yields (3,34) (no longer letter+number! where is the conditional switch?). GTP really needs to be extended to handle large board sizes; in the future people will watch exhibition matches on large boards between AIs. The speed indeed dropped a lot: now 986 batches in 115.7s, amounting to 8.5 b/s. Same speed on 9x9. Memory usage also bumps (up from ~1GB to ~2GB).

So how did you take advantage of smaller boards during self-play? From my tests it seems inference speed is completely determined by the compile-time constant NNPos::MAX_BOARD_LEN. Did you only benefit from shorter length of games and not faster NN evaluations. To take advantage of faster NN evals with the current code, it seems that you need to quit the engine after each game and start another engine compiled for a different board size.

BTW I am a bit surprised that 0.3 for the whole game only loses 20-40 Elo even when chosenMoveSubtract = 0 and chosenMovePrune = 1 (the default settings).

lightvector commented 5 years ago

I didn't try to take advantage of smaller boards during self play computationally. One could do so in theory, but it would be more complicated, since you would not be able to batch the neural net queries from each together, in which case they would be running separately and then the logic for how to keep them running at the right proportions or how to feed the queries to the right GPU server threads would get a lot more complicated. Taking advantage of the shorter games and greater effectiveness of search was already good enough. Also, once you start hitting pro level, my intuition is that although 19x19 generalizes downward pretty well, smaller boards will not generalize upward quite so well, which means you want a lot of your training on the largest sizes anyways where you don't gain anything.

You should be able to do much better than recompiling for a particular size if you want faster evaluations on small boards, although it's not prominently documented. Try adding "maxBoardSizeForNNBuffer=9" into the config for a 9x9 game even when you've compiled for up to 19x19. That should make the tensors on the GPU a lot smaller. I haven't done the work to make this buffer sizing dynamic based on the GTP commands.

Turning locations to strings is handled in game/board.cpp, in Location::toString. I agree GTP is a pretty hacky protocol, it gives no advice on how to do lettering for boards larger than 25x25 and says they're not supported, so I just did something arbitrary in that case. If there's a different arbitrary choice that works better with other programs then I could implement that instead.

lightvector commented 5 years ago

There aren't any more ways to interact with the engine built into GTP right now, since that would also require adding nonstandard commands and up until recently I had no idea which nonstandard extensions were starting to become more de-facto standard simply because people were implementing them in practice. Are there important such de-facto extensions you think would be useful to implemented?

You can evaluate individual positions in an existing sgf file using the "evalsgf" subcommand of the program. That will dump to stdout much the same kind of output that gets dumped to the log, but with slightly more options for interacting with it. Of course, more options still could be implemented, I've only been implementing things as I've needed for getting things to work (e.g. playing on OGS) or for my own debugging use, but it's pretty easy for me to add more things.

alreadydone commented 5 years ago

OK, thanks for letting me know about the hidden parameter maxBoardSizeForNNBuffer which I didn't see in gtp_example.cfg. (I guess I just need to dig a bit more into program/setup.cpp to find more.)

Evaluating one move at a time seems a strange idea; software like GoReviewPartner analyze one game at a time. It would be desirable to implement lz-analyze when KataGo gets stronger, but currently I think people just want to see its winrate estimate (to see if it thinks it has caught up in handicap games, for example). (I told people that KataGo never gives up and has no ladder problems, and people have been testing the 15x192 net's performance in handicap games. The 10x128 and 6x96 nets (along with Zen7 and LZ 10b nets) have been serving as "punching bags" (given handicap) in handicap games.)

lightvector commented 5 years ago

Well, ideally maxBoardSizeForNNBuffer wouldn't exist at all for GTP, it would just do it based on the GTP-sent board size. :)

Yeah evalsgf has mostly been a debugging tool, not a user tool. I had only been implementing exactly what I needed myself and nothing more, since doing otherwise would mean more weeks before being able to complete my paper.

I'll look into implementing lz-analyze. Thanks!

alreadydone commented 5 years ago

For Windows users, I want to clarify that with cudart, cublas and cudnn CUDA10 DLLs you can run compiled binaries above even with only CUDA 9 installed. (A friend tested this since he didn't want to break Tensorflow by installing CUDA 10.)

l1t1 commented 5 years ago

can you add support of cuda9?

lightvector commented 5 years ago

I'm not sure. Does it work with CUDA 9 right now or is there something that breaks? I specified CUDA 10 since that's what my cloud setup had and so it's the one I'm using, but there's a chance it already works with CUDA 9.

alreadydone commented 5 years ago

@l1t1 Have you tried to run it with CUDA9 installed? What error did you get (if any)? Probably it will say you are missing one of cublas64_100.dll, cudart64_100.dll, cudnn64_7.dll, or msvcr110.dll. The first three are bundled in https://userscloud.com/c8llnul1lmrr, and the last one can be installed with vcredist_x64.exe included in KataGo GTP engine and networks (up to 37x37, arbitrary komi, handicap play).

l1t1 commented 5 years ago

thanks @alreadydone

l1t1 commented 5 years ago

do you know which version did cuda lc0 use? https://github.com/LeelaChessZero/lc0/releases

l1t1 commented 5 years ago

i see, they use cublas64_92.dll

Friday9i commented 5 years ago

Seems nice @alreadydone, but could you ideally share the zip somewhere else? Usercloud seems doubtful ...: my antivirus stopped a "malicious script" (is it really malicious or not, I don't know, but I won't try) and to downolad the file, I also need to install a chrome extrension changing by default my preferred search engine: no way... But then I have no access to the compiled version : -( So if you can upload it somewhere else, that would be very nice. Thanks a lot

l1t1 commented 5 years ago

when use Usercloud, you don't install anything at IE /Firefox brower, don't click at popup windows, only need click continue button at the first page until the real link shows

lightvector commented 5 years ago

@alreadydone - I implemented enough of lz-analyze to I hope make it work with Lizzie. Changes pushed to master and pedantic. Note that Lizzie attempts to parse the version output from GTP and complains if it is not the version it expects from Leela Zero, so to make it work you will need to pass -override-version 0.16 when running main gtp to make KataGo pretend that it is version 0.16 for the sake of mimicking what Leela Zero would say so that Lizzie does not complain. I did not test other tools that use lz-analyze.

I was also not able to fully test Lizzie either though, so let me know if there are issues. I don't actually have a local computer with a GPU, only cloud and remote machines, and could not figure out how to make a local Windows machine connect through an ssh tunnel to run KataGo on Linux remotely for Lizzie locally. I did eventually figure out how to run Lizzie remotely as well and X11 forward the window. Unfortunately the X11 was so laggy as to be unusable, allowing me to barely verify that KataGo's lz-analyze output is a close enough mimic that Lizzie starts and displays some numbers, but not to do any further testing for whether there were any further problems.

I also implemented kata-analyze. Same as lz-analyze but it does not multiply the winrates and such by 10000 and round them, it just leaves them as floats, and it also reports the expected score.

Additionally, with another (unpushed) hack, I got the version running on OGS (https://online-go.com/player/592684/) to report the winrate and mean expected score each move. Yay. Although the PV isn't displayed as nicely as roy7 did for Leela Zero, since that requires quite a bit more work with the OGS api than I had time to dig into - sending a chat message is just a matter of sending a string, sending a full variation requires constructing some more complex json object.

@Friday9i @l1t1 - Let me know if there's anything reasonable I can help with on the windows stuff, although it sounds like I don't have much to contribute over @alreadydone having not actually compiled on Windows myself.

thorsilver commented 5 years ago

@l1t1 the link does not work at all for me. I just get bombarded with popups and the link loads but leads to an error page. Have tried repeatedly and always the same result.

Can someone put the compiled version somewhere else please?

alreadydone commented 5 years ago

Engine (exe and config files): https://drive.google.com/file/d/1Cg3VUiJmC6qvyuLPRn5OYfmRGBXQzzcH/view?usp=sharing NVIDIA DLLs: https://drive.google.com/file/d/10EvCZH2xj6boVES3YA8kwvdt6SZLA3Au/view?usp=sharing Updated engine that works with Lizzie (exe only): https://mega.nz/#!6vwEiIDC!ovnM-5vvzp0CekW_0iZv-kMOz2y-KYTriDfxgXMqGhg

Friday9i commented 5 years ago

I tried to use the windows version (provided above) with Sabaki, but no success ; -( Anyone knows the parameters to be used? I tried several versions for the 3 lines in "manage engines", around:

"C:[...]\KataGo\main37.exe" (or main.exe)
"-gtp -model C:[...]\KataGo\15b.txt.gz -config C:[...]\KataGo\configs\gtp_example.cfg" (and I tried also with -v 1000)
Nothing on the third line (or "time_settings 0 1 0" as for LZ)

But whatever I tried, I get a "connection failed". Thanks a lot

alreadydone commented 5 years ago

There should not be a dash before gtp.
Maybe 15x192.txt instead of 15b.txt.gz? It's reported above that gzipped model doesn't work...
Maybe (definitely for Lizzie) you need to append -override-version 0.16 (after gtp_example.cfg).
I think time_setting works but -v 1000 doesn't.
You may want to refer to the readme.txt or 说明 Instructions.txt in the archives I made.

Friday9i commented 5 years ago

Thanks a lot for the advices @alreadydone! I already read the readme.txt, I tried without the dash before gtp but I also forgot to use .txt instead of .gz: it may be why it didn't work. I'll try again tonight (European time) and will give an update of the result here, as well as the precise parameters used if it works ; -) Thanks a lot!

alreadydone commented 5 years ago

You can run the command in the console for more visible and faster check. Though Katago won't spit out messages when the model finished loading, you may type showboard or genmove to check whether things initialized correctly (optionally, watch task manager until memory usage stabilizes, that's a sign of model loading completion; look into gtp.log to see if you get any errors).

I checked it works fine in Lizzie, but seemed to get problems with mylizzie or Sabaki analysis mode.

Friday9i commented 5 years ago

Problem was the .gz, thx! Hence, I use in Sabaki, "Manage Engines":

"D:\[...]\KataGo\main37.exe"
"gtp -model D:\[...]\KataGo\15b.txt -config D:\[...]\KataGo\configs\gtp_example.cfg"
"time_settings 0 6 1" (or 0 1 0 and visits/playouts in the cfg file)

Note: to modify the cfg file, I change it to a txt extension, modify it, then change it back to cfg.

lightvector commented 5 years ago

@alreadydone pedantic branch changes are all merged into master now, along with various bugfixes, implementation of LCB, and other minor things.

alreadydone commented 5 years ago

Thanks for the work! Updated executable main.zip and my vs branch.

l1t1 commented 5 years ago

could you post a compile guide from zero step by step?

petgo3 commented 5 years ago

@alreadydone: Since i still can't compile on Windows, can you perhaps try a merge(to your fork)/compile including latest pushes of lightvector? There is at least an interesting addition for handicap included.

lightvector commented 5 years ago

Within a few days, I expect to also have pre-compiled windows binaries available for OpenCL, and maybe CUDA. There are also several remaining optimizations that I am working on for the implementation that I also have yet to push, as well as some further fixes to things that MSVC complains about, including a couple of warnings for things that actually are potential bugs-in-waiting that g++ did not warn about.

lightvector / KataGo

compiling on Windows #2