lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.56k stars 564 forks source link

Android binary #321

Open acristescu opened 4 years ago

acristescu commented 4 years ago

Now that we have a CPU-only version of the engine, would it be possible to add arm7 and/or arm64 to the compiled binary of the release? I couldn't get it going, but then I haven't compiled any C++ since I finished uni many years ago :) It should be possible though as there are several of these floating around for both Leela Zero and SAI (for example https://github.com/Grant-Tao/compiled-leelaz-0.17-for-android-phones and https://github.com/evdwerf/leela-zero/tree/android ).

The second repo above even has the modifications to the Makefile for Android (see this comparison https://github.com/leela-zero/leela-zero/compare/next...evdwerf:android ).

Is that something that could be achieved?

acristescu commented 3 years ago

So he modified the 20b network to work faster on Android 64-bits? Is there any loss of the quality of play? Which net did he pick to optimize (there are quite a few 20b ones).

portkata commented 3 years ago

@acristescu I think there's no loss in quality of play. Its probably the last 20b extended training, i can check it with a hex editor next weekend. With 1 thread on my device, it does 12 playouts in the same time it used to do 2! it is twice as fast as leela zero.

im curious as to if when you try the new binary with that net in the command line or in your app, do you see the same speed increase? if so, you might be able to reduce visits by a factor of 4 and get the same strength, but the downside would be a bigger apk.

The new 20b net is much faster at visit parity than the old 15b. ive only tested with 1 thread though.

It can even play instant moves on ancient devices. my $220 e ink ereader plays instantly with the new net. I think probably same for orange pi users https://www.amazon.com/dp/B0824PZ45Y?ref=myi_title_dp

cryptsport commented 3 years ago

network remains the same, its use is optimized

acristescu commented 3 years ago

Not quite true, what it looks like he did is converted the model to Google's TensorFlow lite format and somehow made KataGo work with tensorflow. This is quite impressive, as the new format should be both faster (and importantly for mobile) quite compact as well. I'm not sure if @lightvector knows about this, but I think this would be something really cool to include back in the main project.

However, I could not get it to run, probably because the command line parameters are different. @aki65 could you kindly share what the new command lines parameters are (in the spirit of open-source)?

cryptsport commented 3 years ago

unclear. if unpack the apk, then unpack "private.mp3", like a zip, there is a part of the code that completely coincides with the official network g170e-b20c256x2-s5303129600-d1228401921.bin.gz. (7ED1600h arm64-v8a-rel-0.21)

acristescu commented 3 years ago

The old network is still there, but now there's a new file called 20b.tflite. If you try to just run the katago binary included in 0.21 you also get an error saying that it's missing libtensorflowlite.so. This did not happen with the old 0.16. From this I have deduced that he somehow made katago work with TensorFlow.

cryptsport commented 3 years ago

this is very interesting, but it doesn't work for me v64. so is there a new network or not? or is it not clear?

acristescu commented 3 years ago

only @aki65 can answer these questions for certain.

it doesn't work for me v64

Do you mean LazyBaduk does not work for you or that you cannot run the new katago binary? I can't get it to work either...

cryptsport commented 3 years ago

only 32bit version works on my devices. for some reason also on the nox emulator (probably there is no 64bit support, I will find out)

cryptsport commented 3 years ago

do you mean "BadukAI"? and which network? Can this network be pulled out of the apk? how to find?

cryptsport commented 3 years ago

the sizes of unpacked files "private" v17 and v18 - 198MB and 225MB. little space for another network

portkata commented 3 years ago

imagine after distributed training when the 40b policy reaches 9 dan and he optimizes that network. The kyu rank mode will be awesome. turning on the opening book, they play well all the way down to kyu rank 10. they go up to kyu rank -8. Cryptpark has tested against crazystone zero, i think it can already beat the 7d setting?

acristescu commented 3 years ago

All of that is great, too bad is not open source so that other open source projects could use it... :|

cryptsport commented 3 years ago

answer @aki65 https://github.com/aki65/aki65.github.io/issues/8#issuecomment-740121624

l1t1 commented 3 years ago

what are the meaning of these files in the apk

10b.bin.gz
20b.bin.gz
15b.tflite
20b.tflite
40b.tflite
portkata commented 3 years ago

the 15b and 40b are for leela zero, they 20b is for the optimized katago, which is now faster than leela zero, setting numsearchthreads to 1.

portkata commented 3 years ago

answer @aki65 aki65/aki65.github.io#8 (comment)

to summarize for this thread, aki65 said there should be no change in invoking katago to use the tensorflow optimized net. Can someone run the optimized net in android command line and confirm? This net is really great for calibrated ai because it is so much faster. One can run the kyu rank bots on very old mobile processors now with instant moves; together with his opening book for the first moves, the kyu rank bot plays great and fast from kyu rank 10 and stronger. aki65's binary w/optimized net, .so, .cfg files: https://easyupload.io/yza296

acristescu commented 3 years ago

Can someone run the optimized net in android command line and confirm?

I just tried by unzipping the files provided, using adb push to upload them to an S9+, then doing adb shell and running the binary with LD_LIBRARY_PATH=. ./katago_binary_android. When providing no parameter it does work (prints the usage info), but as soon as I add any parameter (for example LD_LIBRARY_PATH=. ./katago_binary_android version) it stops working. It just exits without any error.

I have spent some 3 hours during the weekend trying every which way, with different versions of the libraries command lines, etc. The same method worked for the old katago, leela zero and SAI. I must be missing something...

portkata commented 3 years ago

just saw the new icon in your app, it looks great! I still wish katago used half as many playouts though. or maybe have easy, medium and hard settings? Thanks for putting in all the time!

acristescu commented 3 years ago

Plan A was to put in a scaling mechanism as the one in katrain, but I just can't find the time for such an undertaking.

Plan B was to basically have the app compute the number of playouts per second it gets and then scale the number of playouts for the next moves so as to keep the time per move to 1s on any device. This requires less development, but would make the AI be inconsistent.

Plan C would be to at least temporarily have a global setting in the settings page where you can tweak this to your heart's desire. Might go with this in the next version.

portkata commented 3 years ago

oh wow this is going to be great, thanks!

portkata commented 3 years ago

aki65 added support for the new distributed kata1 weights on android. https://github.com/aki65/aki65.github.io/releases

l1t1 commented 3 years ago

aki65 added support for the new distributed kata1 weights on android. https://github.com/aki65/aki65.github.io/releases

thanks

portkata commented 3 years ago

aki65 released optimized s580 40b distributed training weight. It is extremely, extremely fast. https://github.com/aki65/aki65.github.io/releases/tag/v1.4.1 In a quick test, the optimized s580 policy seems to be atleast as strong as the non optimized last 40b s509 net of the non distributed run. 6 games policy (t1 p1 nncache=2) against 20b 5 playouts (t1 p5), the optimized net was 4 wins - 2 losses, the non optimized s509 was 2 wins - 4 losses. The optimized net is almost 4 times faster and almost 1/4 the size of the s509, which is a really incredible accomplishment by lightvector, akigo, sanderland and all the people who contributed gpus to play 3,000,000 training games. wow.

Imagine if in 2018, when Leela Zero was first starting, if someone would have told you that 3 years later there would be an app that was able to play from 10kyu - 9d on a $100 smartphone, making it's moves almost instantly. And it could play with variable komi and no ladder weaknesses.

acristescu commented 3 years ago

If we could also have that open-source, that would be a dream...

HackYardo commented 2 years ago

Sorry to bother you all after so long! I'm an amateur in programming and I'm not sure whether this Issue is discussing a full apk or something that just works? Because I can't stand LeelaZero only can play Go games on a 19x19 board, which is too big on a phone screen. Today, I try to compile KataGo in Ubuntu in Termux on Android, it works and works fine, all you need to do is to read the guide here: https://github.com/lightvector/KataGo/blob/master/Compiling.md#linux

jopdorp commented 2 months ago

I was able to compile for android with opencl support using my branch: https://github.com/jopdorp/KataGo/tree/android-support

now in the runtime when I try to use it, I get a crash during the tuning process:

No existing tuning parameters found or parseable or valid at: /data/user/0/nl.jopdorp.opengoban/files/.katago/opencltuning/tune11_gpuMaliG710r0p0_x19_y19_c128_mv8.txt Performing autotuning

......

Tuning hGemmWmma for convolutions error: couldn't allocate output register for constraint 'r'

jopdorp commented 2 months ago

@acristescu do you use OpenCL in the Sente app? If so how did you do the tuning, and would you have some tune files I could try?