lightvector / KataGo

GTP engine and self-play learning in Go
https://katagotraining.org/
Other
3.56k stars 564 forks source link

OpenCL tuning fails on Pixel7a MaliG710 GPU #977

Open jopdorp opened 2 months ago

jopdorp commented 2 months ago

I was able to compile for android with opencl support using my branch: https://github.com/jopdorp/KataGo/tree/android-support

Hopefully this will make it easier to use katago on android for future devs. Now in the runtime when I try to use it, I get a crash during the tuning process:

No existing tuning parameters found or parseable or valid at: /data/user/0/nl.jopdorp.opengoban/files/.katago/opencltuning/tune11_gpuMaliG710r0p0_x19_y19_c128_mv8.txt Performing autotuning

......

Tuning hGemmWmma for convolutions error: couldn't allocate output register for constraint 'r'

HackYardo commented 1 month ago

Have you tried Termux, an Android terminal emulator and Linux environment app without root? Google Android is sort of bare Android. To compile something you will need a comfortable environment.

jopdorp commented 3 weeks ago

I don't want to use termux, and my compilation step went fine, just skipped hGemmWmma for now with a patch. The point is that I'm adding android support to the codebase, so people can build for android with a normal workflow instead of having to compile from an android device.

HackYardo commented 3 weeks ago

It's not Android, it is Linux ARM. And the compiling steps are the same as on a Linux AMD64 platform.

jopdorp commented 2 weeks ago

@HackYardo Linux ARM comes in a couple of different flavors, and the android flavor is not the same as normal 32 bit or 64 bit arm, it also has different arch names that need to be taken into account in the CMake

Maybe you can have a look at my pull request and review with useful feedback. One thing I'm still struggling with, is that the endianness of model files loaded into the Android version are reversed, so I had to reverse the endianness of the models that go into it, which is a hassle.

I think there should be a way to adjust the CMake file to fix this, but I didn't succeed there yet.