Open ghost opened 4 years ago
This leaves another question: Will @y-ich eventually release a mac version of the app?
Maybe KataGo can support it in the future. But I can't guarantee that I'll spend all the effort to write yet another GPU backend. It's an enormous amount of work to invest for me. It's sad to see Apple dropping support for an open standard that currently works pretty well across a wide range of devices. From my perspective, all it does it create yet another burden with no benefit.
In fact, KataGo on Core ML works well on macOS. I may release "A Master of Go" on macOS when Apple stops providing OpenCL. The prototype app is just compiled as Catalyst. No mouse-over support. Currently Lizzie with KataGo is more useful than it.
@y-ich Asking just out of curiosity, what about the performance? Is the CoreML version faster than OpenCL one?
@y-ich That is good to hear! But I agree that KataGo with a GTP interface is more useful because it can run under Lizzie or Sabaki.
@lightvector It's unfortunate that Apple is removing support for standards and doing its own thing, but unfortunately that's how it is. Without a CoreML backend both KataGo nor Leela Zero will be completely unusable on macOS in the near future. I want to help but am neither proficient with C++ nor with CoreML…
@isty2e san,
Maybe my Mac (Late 2012) is too old to talk about reference performance, but I think that basically optimized OpenCL is much faster than Core ML. 7 nnBatches/s by OpenCL KataGo, 3 evaluations per sec. by CoreML KataGo, on my Mac.
did someone test go engine on the vmware or subsystem, such as ubuntu of windows 10?
(deleted a comment that was a duplicate of a comment posted in another issue and is off-topic for this issue).
@y-ich is coreml backend opensource ? https://github.com/y-ich/KataGo/blob/browser/cpp/neuralnet/tfjsbackend.cpp
@dappstore123 san,
No, not yet.
Is there any chance that someone could create a generic OpenCL wrapper for CoreML, with at least sufficient functionality for KataGo? (This may well reveal how little I know about macOS, OpenCL & CoreML, but it is the first thing that occurs to me as a retired developer.)
The M1 processor of Apple's new MacBooks includes a Neural Engine, which should be very capable of running KataGo using Core ML.
Unfortunately I'm not familiar with either C++ or neural networks but if there's anything I can do to help… Thank you.
at least katago has a cpu backend. I belive apple will not remove c++ support in near future.
M1 GPU is said to be ahead of the field: Neural Engine, with 16 Cores and the capability of doing 11 trillion operations per second.
does anyone have evidence katago plays pro on M1?
tx
It plays... at about 1/10th the performance as on a decent NVidia GPU, at least as I write this with the benchmark running on both. But I don't have the most recent M1s, this is the little 13 inch laptop. That said, I don't have high confidence in the larger laptops since it seems that it's just not really taking advantage of the 8-core GPU.
I test on M1, about 400V / s, 40b weight
I tested katago on Apple M1 Pro:
% cpp/katago version
KataGo v1.11.0
Git revision: 714ad713a829e2fca465ec4113bb1afa4c6ec543
Compile Time: Jul 31 2022 11:35:36
Using OpenCL backend
I got visits/s = 196.37 from the model: g170-b40c256x2-s5095420928-d1229425124:
% cpp/katago benchmark -config cpp/configs/gtp_example.cfg
...
2022-08-01 13:44:57+0800: Initializing neural net buffer to be size 19 * 19 exactly
2022-08-01 13:44:58+0800: Found OpenCL Platform 0: Apple (Apple) (OpenCL 1.2 (Jun 17 2022 18:58:05))
2022-08-01 13:44:58+0800: Found 2 device(s) on platform 0 with type CPU or GPU or Accelerator
2022-08-01 13:44:58+0800: Found OpenCL Device 0: Apple M1 Pro (Intel) (score 102)
2022-08-01 13:44:58+0800: Found OpenCL Device 1: Apple M1 Pro (Apple) (score 1000102)
2022-08-01 13:44:58+0800: Creating context for OpenCL Platform: Apple (Apple) (OpenCL 1.2 (Jun 17 2022 18:58:05))
2022-08-01 13:44:58+0800: Using OpenCL Device 1: Apple M1 Pro (Apple) OpenCL 1.2 (Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images )
2022-08-01 13:44:58+0800: Loaded tuning parameters from: /Users/chinchangyang/.katago/opencltuning/tune8_gpuAppleM1Pro_x19_y19_c256_mv8.txt
2022-08-01 13:44:58+0800: OpenCL backend thread 0: Model version 8
2022-08-01 13:44:58+0800: OpenCL backend thread 0: Model name: g170-b40c256x2-s5095420928-d1229425124
2022-08-01 13:45:01+0800: OpenCL backend thread 0: FP16Storage true FP16Compute false FP16TensorCores false
Possible numbers of threads to test: 1, 2, 3, 4, 5, 6, 8, 10, 12, 16, 20, 24, 32,
numSearchThreads = 5: 10 / 10 positions, visits/s = 127.91 nnEvals/s = 105.35 nnBatches/s = 42.33 avgBatchSize = 2.49 (62.9 secs)
numSearchThreads = 12: 10 / 10 positions, visits/s = 178.48 nnEvals/s = 146.17 nnBatches/s = 24.69 avgBatchSize = 5.92 (45.4 secs)
numSearchThreads = 10: 10 / 10 positions, visits/s = 168.67 nnEvals/s = 138.70 nnBatches/s = 28.07 avgBatchSize = 4.94 (48.0 secs)
numSearchThreads = 20: 10 / 10 positions, visits/s = 196.37 nnEvals/s = 169.76 nnBatches/s = 17.37 avgBatchSize = 9.77 (41.7 secs)
numSearchThreads = 8: 10 / 10 positions, visits/s = 163.03 nnEvals/s = 134.23 nnBatches/s = 33.88 avgBatchSize = 3.96 (49.5 secs)
numSearchThreads = 16: 10 / 10 positions, visits/s = 195.88 nnEvals/s = 163.82 nnBatches/s = 20.83 avgBatchSize = 7.86 (41.6 secs)
Ordered summary of results:
numSearchThreads = 5: 10 / 10 positions, visits/s = 127.91 nnEvals/s = 105.35 nnBatches/s = 42.33 avgBatchSize = 2.49 (62.9 secs) (EloDiff baseline)
numSearchThreads = 8: 10 / 10 positions, visits/s = 163.03 nnEvals/s = 134.23 nnBatches/s = 33.88 avgBatchSize = 3.96 (49.5 secs) (EloDiff +70)
numSearchThreads = 10: 10 / 10 positions, visits/s = 168.67 nnEvals/s = 138.70 nnBatches/s = 28.07 avgBatchSize = 4.94 (48.0 secs) (EloDiff +70)
numSearchThreads = 12: 10 / 10 positions, visits/s = 178.48 nnEvals/s = 146.17 nnBatches/s = 24.69 avgBatchSize = 5.92 (45.4 secs) (EloDiff +78)
numSearchThreads = 16: 10 / 10 positions, visits/s = 195.88 nnEvals/s = 163.82 nnBatches/s = 20.83 avgBatchSize = 7.86 (41.6 secs) (EloDiff +90)
numSearchThreads = 20: 10 / 10 positions, visits/s = 196.37 nnEvals/s = 169.76 nnBatches/s = 17.37 avgBatchSize = 9.77 (41.7 secs) (EloDiff +65)
...
Which Apple M1 Pro box?
score 1000102 seems high, or even out of range?
Which Apple M1 Pro box?
score 1000102 seems high, or even out of range?
It is Macbook Pro with M1 Pro chip.
Which Apple M1 Pro box?
score 1000102 seems high, or even out of range?
I can confirm, I get the same value on my Macbook Pro 14' M1.
any ideas why the score is around 20x better than other OpenCL scores?
The score doesn't have any numerical meaning, there is no such thing as "20x better", the same way that if some software had a version 2.2 it would not make sense to say it is "10% more recent" than a software version 2.0. It's just some hardcoded logic to pick which OpenCL platform by default if the user doesn't specify, based on the name of the platform.
OpenCL scores: https://browser.geekbench.com/opencl-benchmarks
if it is not a score, perhaps a different word could be used?
Heh, it never even crossed my mind that anyone would associate an automatic number logged in KataGo's output to have some relation to some specific benchmark run by a major but otherwise arbitrary benchmarking website that I've possibly never looked at myself. Sure, I'll consider it, but "score" is a very common and generic word that is used by a lot of different things, so I'd be interested to hear if this confusion is actually common.
"score" means a written piece of music, to mark or scratch, a number of points made in a game, or, the result of a benchmark or test. Given the context, any native speaker of English is going to presume you mean the result of a benchmark or test. It's clearly not a piece of music, it's not marking or scratching anything, and it's not interactive so there's no active game being played nor potential for it to be referring to some other game or sporting event. I'm curious why the word was picked? "Version" is usually the English word used for "arbitrary changing number that is totally not a score."
Hehe, well "version" would be an even more inaccurate word, because this is definitely not a number that labels different varieties or kinds or... well... versions, of a given thing. :)
What word would you use to describe a numerical value, computed by adding up several component values based on the device's name, vendor, accelerator type, and how recent a version of OpenCL it supports (these are the main pieces of metadata that a device will report about itself that we have available), to guess at an overall likely best default device to choose?
Normally I would describe the above by saying that we are heuristically scoring the device based on testing the contents of these metadata fields. For example the vendor being Nvidia contributes a higher score to the device than if it's Intel, because usually if a user has an Nvidia device and an Intel device on the same machine, they probably prefer the former because the former is likely some high-powered GPU actually designed for running large parallel computations, whereas the latter is likely some cheap onboard integrated graphics chip.
The numerical magnitude of the "scores" is of course arbitrary except for the fact that they are a hardcoded guess at the likely preferability of different things, where larger is more likely preferable.
Anyways, sounds like I should change the label in the output anyways, just to make it clearer that it doesn't have anything to do with anything external. Maybe it should be called "ordering" or something.
Chin-Chang Yang's guide to building coreML version of katago: Improve KataGo performance by CoreML backend on MacOS. It's not for the faint-hearted.
Thanks for sharing my post, but it is outdated. I recommend to see the release page because it is updated.
ah thanks for the prompt response. still looks a little hairy, how would I know OpenCL & CoreML were both operational?
ah thanks for the prompt response. still looks a little hairy, how would I know OpenCL & CoreML were both operational?
With respect to OpenCL operations, it is possible to view GPU utilization on macOS by utilizing the built-in Activity Monitor application. In regards to CoreML operations, one can view Neural Engine activities by using Xcode Profile in conjunction with instruments.
Is there no message as currently on initial startup? I should have thought this was a minimal requirement? ie I have no idea what: using Xcode Profile in conjunction with instruments. implies.... tx
Is there no message as currently on initial startup? I should have thought this was a minimal requirement? ie I have no idea what: using Xcode Profile in conjunction with instruments. implies.... tx
According to @horaceho's report, the message has been fixed in v1.11.0-coreml2.
2022-09-05 14:18:03+0800: CoreML backend thread 1: Device 1 Model version 8
2022-09-05 14:18:03+0800: CoreML backend thread 1: Device 1 Model name: g170-b40c256x2-s5095420928-d1229425124
2022-09-05 14:18:03+0800: CoreML backend thread 0: Device 0 Model version 8
2022-09-05 14:18:03+0800: CoreML backend thread 0: Device 0 Model name: g170-b40c256x2-s5095420928-d1229425124
2022-09-05 14:18:07+0800: CoreML backend thread 0: Device 0
2022-09-05 14:18:10+0800: CoreML backend thread 1: Device 1
tx
I posted an enhancement request for information on coreML under Linux a couple of weeks ago: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8299 today Alyssa closed: anec.py is a python script to convert an Apple CoreML mlmodel to a Linux executable -- with my new driver, of course :). https://github.com/eiln/anec.py whether this is sufficient & will find a maintainer at katago remains to be seen....
ChinChangYang please could you enable issue on your fork?
2 issues of interest to this thread: is there full M3 CoreML support? could we have a brew .dmg or other simpler than build installation method?
obviously highly interested and delighted in CoreML progress congrats!!!
ChinChangYang please could you enable issue on your fork?
2 issues of interest to this thread: is there full M3 CoreML support? could we have a brew .dmg or other simpler than build installation method?
obviously highly interested and delighted in CoreML progress congrats!!!
Thank you for your interest and enthusiasm regarding the KataGo Metal and CoreML backends! I'm delighted to hear that the progress on CoreML is exciting for you.
Regarding your queries:
Issue Management: Currently, I am unable to enable issues on my fork due to time constraints. My commitments limit my ability to effectively manage and respond to issues. However, I'm hopeful that other contributors can step in to actively manage the KataGo Metal and CoreML backends.
M3 CoreML Support: The KataGo with Metal and CoreML backends has indeed been tested on the MacBook Pro M3 Max. Here are the benchmark results for your reference:
Version (-v2048) | Model | Thread | Visits/s | NnEvals/s
----------------------|-----------|--------|----------|----------
v1.14.0 OpenCL | b18c384nbt| 20 | 490.61 | 364.17
v1.14.0 Metal | b18c384nbt| 20 | 549.11 | 401.69
v1.14.0 Metal + CoreML| b18c384nbt| 16 | 694.71 | 536.87
While full support for M3 CoreML is a goal, please note that my resources are currently limited, and further development might depend on community contributions.
Simpler Installation Methods: The build process for KataGo is currently limited to CMake and Xcode, due to my expertise being focused in these areas. The convenience of a brew package or a ".dmg" file is understood, but I currently lack the knowledge to package KataGo in these formats. This is an area where community contributions could be significantly impactful.
I encourage anyone in the community with the relevant expertise to contribute towards these improvements.
Extraordinarily kind & full response! thank you.
Hello,
OpenCL is deprecated in macOS 10.15, and it will probably be removed in the near future.
Are there plans to create a CoreML backend? That is, does KataGo have a future on macOS?
I'm asking because I'm considering buying a MacBook (have not decided on the model), and it should last a few years.
As far as I know the iOS app "A Master of Go" uses a CoreML backend, so it should be possible, although I'm not sure whether CoreML can be used with C++.
Thank you.