Closed pixelspark closed 1 year ago
I could update the PR for metal support. Could you test it together with the other backends (cuBLAS / CLBlast)?
Definitely! Currently traveling but will have access to my M1 Max machine tomorrow again.
Unfortunately, ggerganov/ggml does not have support for Metal yet. Thus it can not be enabled in rustformers/llm yet. Never the less it would be great if you could test the RP's cuBLAS / CLBlast functionality.
Hm, the Metal bits should be upstreamed to ggerganov/ggml
soon, right? (As far as I understand it, it is wholly contained in two files).
I quickly tested CLBlast on my work laptop with llama.cpp itself, but its AMD Radeon GPU appears a bit too shitty for it to work. I will attempt a test with cuBLAS
later on a more beefy machine with NVIDIA hardware.
Soon is very relative. Some files like ggml-opencl. and ggml-cuda. for example haven't been touched for weeks.
I've updated the BLAS PR to support Metal too. You can use it like so:
cargo run --release --features metal mpt infer --model-path mpt-7b-chat-q5_1.bin -p "Once upon a time"
Could you please test it out?
Nice 👍🏻 I am seeing the following errors at build time:
error: environment variable `CUDA_PATH` not defined at compile time
--> crates/ggml/sys/build.rs:126:39
|
126 | let targets_include = concat!(env!("CUDA_PATH"), r"\include");
| ^^^^^^^^^^^^^^^^^
|
= help: use `std::env::var("CUDA_PATH")` to read the variable at run time
= note: this error originates in the macro `env` (in Nightly builds, run with -Z macro-backtrace for more info)
error: environment variable `CUDA_PATH` not defined at compile time
--> crates/ggml/sys/build.rs:127:35
|
127 | let targets_lib = concat!(env!("CUDA_PATH"), r"\lib\x64");
| ^^^^^^^^^^^^^^^^^
|
= help: use `std::env::var("CUDA_PATH")` to read the variable at run time
= note: this error originates in the macro `env` (in Nightly builds, run with -Z macro-backtrace for more info)
error: environment variable `CUDA_PATH` not defined at compile time
--> crates/ggml/sys/build.rs:171:39
|
171 | let targets_include = concat!(env!("CUDA_PATH"), "/targets/x86_64-linux/include");
| ^^^^^^^^^^^^^^^^^
|
= help: use `std::env::var("CUDA_PATH")` to read the variable at run time
= note: this error originates in the macro `env` (in Nightly builds, run with -Z macro-backtrace for more info)
error: environment variable `CUDA_PATH` not defined at compile time
--> crates/ggml/sys/build.rs:172:35
|
172 | let targets_lib = concat!(env!("CUDA_PATH"), "/targets/x86_64-linux/lib");
| ^^^^^^^^^^^^^^^^^
|
= help: use `std::env::var("CUDA_PATH")` to read the variable at run time
= note: this error originates in the macro `env` (in Nightly builds, run with -Z macro-backtrace for more info)
error: could not compile `ggml-sys` (build script) due to 4 previous errors
warning: build failed, waiting for other jobs to finish...
The build finished when I prepend CUDA_PATH=x
to the command. However, when I then run the suggested command for testing, it still uses the CPU (you may need to invoke GGML in a specific way to run on Metal - see ggml-metal.m
. Effectively you should do what the main
executable of llama.cpp does when you pass -ngl 1
. This parameter determines the number of layers to offload to the GPU - for Metal, when the value is 1 or higher, it apparently runs everything on GPU).
The offloading part is not there yet. All the code does so far is expose the API required to leverage that functionality in the near future.
Nevertheless, you should be able to see an increase in VRAM usage if Metal is really enabled. It should also be able to process lengthy prompts much much faster.
Could you check that? And could you also check if CLBlast exhibits the same behavior as Metal?
Running the suggested command does not appear to use the GPU (I get ±800% CPU usage which is expected for a CPU run. GPU monitor shows very little use).
Note that the generated binary also doesn't appear to link to Metal in any way:
Also the generated binary does not contain the string 'metal'. It appears nothing really changes when --features metal
is passed? (I am on 022a075608c5b90c54946ad01a204a63c54657cf
).
As for CLBlast: same issue with missing CUDA_PATH
, and with CUDA_PATH=x cargo build --verbose --features clblast --release
(I ran brew install clblast
before):
The following warnings were emitted during compilation:
warning: llama-cpp/ggml-opencl.cpp:10:10: fatal error: 'clblast.h' file not found
warning: #include <clblast.h>
warning: ^~~~~~~~~~~
warning: 1 error generated.
error: failed to run custom build command for `ggml-sys v0.2.0-dev (/Users/tommy/Repos/llm/crates/ggml/sys)`
Caused by:
process didn't exit successfully: `/Users/tommy/Repos/llm/target/release/build/ggml-sys-2d4c323b8b047b52/build-script-build` (exit status: 1)
--- stdout
cargo:rerun-if-changed=llama-cpp
OPT_LEVEL = Some("3")
TARGET = Some("aarch64-apple-darwin")
HOST = Some("aarch64-apple-darwin")
cargo:rerun-if-env-changed=CC_aarch64-apple-darwin
CC_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=CC_aarch64_apple_darwin
CC_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_CC
HOST_CC = None
cargo:rerun-if-env-changed=CC
CC = None
cargo:rerun-if-env-changed=CFLAGS_aarch64-apple-darwin
CFLAGS_aarch64-apple-darwin = None
cargo:rerun-if-env-changed=CFLAGS_aarch64_apple_darwin
CFLAGS_aarch64_apple_darwin = None
cargo:rerun-if-env-changed=HOST_CFLAGS
HOST_CFLAGS = None
cargo:rerun-if-env-changed=CFLAGS
CFLAGS = None
cargo:rerun-if-env-changed=CRATE_CC_NO_DEFAULTS
CRATE_CC_NO_DEFAULTS = None
DEBUG = Some("false")
CARGO_CFG_TARGET_FEATURE = Some("aes,crc,dit,dotprod,dpb,dpb2,fcma,fhm,flagm,fp16,frintts,jsconv,lor,lse,neon,paca,pacg,pan,pmuv3,ras,rcpc,rcpc2,rdm,sb,sha2,sha3,ssbs,vh")
cargo:rustc-link-lib=clblast
cargo:rustc-link-lib=OpenCL
cargo:rustc-link-lib=framework=Accelerate
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-arch" "arm64" "-I" "llama-cpp" "-DGGML_USE_CLBLAST" "-mcpu=native" "-pthread" "-DGGML_USE_ACCELERATE" "-DNDEBUG" "-o" "/Users/tommy/Repos/llm/target/release/build/ggml-sys-ae54095a50ed1651/out/llama-cpp/ggml.o" "-c" "llama-cpp/ggml.c"
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-arch" "arm64" "-I" "llama-cpp" "-DGGML_USE_CLBLAST" "-mcpu=native" "-pthread" "-DGGML_USE_ACCELERATE" "-DNDEBUG" "-o" "/Users/tommy/Repos/llm/target/release/build/ggml-sys-ae54095a50ed1651/out/llama-cpp/ggml-opencl.o" "-c" "llama-cpp/ggml-opencl.cpp"
cargo:warning=llama-cpp/ggml-opencl.cpp:10:10: fatal error: 'clblast.h' file not found
cargo:warning=#include <clblast.h>
cargo:warning= ^~~~~~~~~~~
cargo:warning=1 error generated.
exit status: 1
exit status: 0
--- stderr
error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-arch" "arm64" "-I" "llama-cpp" "-DGGML_USE_CLBLAST" "-mcpu=native" "-pthread" "-DGGML_USE_ACCELERATE" "-DNDEBUG" "-o" "/Users/tommy/Repos/llm/target/release/build/ggml-sys-ae54095a50ed1651/out/llama-cpp/ggml-opencl.o" "-c" "llama-cpp/ggml-opencl.cpp" with args "cc" did not execute successfully (status code exit status: 1).
The CUDA_PATH error should be gone with the last commit.
It seems you don't have clblast installed.
Can you check if ggml-metal.o in target was generated?
The CUDA_PATH error should be gone with the last commit.
Yes 👍🏻
It seems you don't have clblast installed.
Well it seems it cannot find it for some reason... this might just be my machine.
Can you check if ggml-metal.o in target was generated?
Unfortunately:
rm -rf target/release
cargo build --release --features=metal
find ./target/release | grep metal
Does not find anything 😢
I disabled a couple of things in the last commit that might have caused issues. Can you please try again?
That builds a ggml-metal.o:
tommy@tymax-2 llm % find ./target/release | grep metal
./target/release/build/ggml-sys-1cf394792f1e4b44/out/llama-cpp/ggml-metal.o
Also the binary now links Metal:
tommy@tymax-2 llm % otool -L target/release/llm
target/release/llm:
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1500.65.0)
/System/Library/Frameworks/Security.framework/Versions/A/Security (compatibility version 1.0.0, current version 60420.101.2)
/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1971.0.0)
/System/Library/Frameworks/Foundation.framework/Versions/C/Foundation (compatibility version 300.0.0, current version 1971.0.0)
/System/Library/Frameworks/Metal.framework/Versions/A/Metal (compatibility version 1.0.0, current version 306.5.16)
/System/Library/Frameworks/MetalKit.framework/Versions/A/MetalKit (compatibility version 1.0.0, current version 157.0.0)
/System/Library/Frameworks/MetalPerformanceShaders.framework/Versions/A/MetalPerformanceShaders (compatibility version 1.0.0, current version 126.3.5)
/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.100.3)
All inference is still happening on the CPU though.
I am still trying to figure out why the Metal code was not compiled earlier. I made some changes to restore the functionality I removed previously. Can you please let me know if ggml-metal is still being successfully compiled?
And I know now why you have noticed no improvements at all. When ggml_init is called by llm, cuBLAS and CLBlast are initialized if enabled. That is not the case for Metal. Yet.
The first step is to enable Metal in ggml so that it can be used by llm.
Still builds fine!
So what would be needed now in order to add Metal support? I suspect most calls will be similar, we just need some if cfg!(feature="metal") && use_metal { .. } else { .. }
for these (ggml_init
-> ggml_metal_init
, ggml_graph_compute
-> ggml_metal_graph_compute
)?
Nice! Thank you.
I did take a brief look at llama.cpp only and while I am not very familiar with either llama.cpp or llm I would say the context in llm needs to be expanded to hold the structure returned by ggml_metal_init and some of the ggml_X APIs need to be replaced with the Metal ones as you mentioned earlier.
OK, I might have a go at that if I can find the time. Probably not so easy to do for you without access to a machine to test on. Having the build sorted out is a great first step, thanks!
FYI, the latest commit from the PR stops linking the Metal library properly. Resetting to 8666654c0ff641badbdb9cdfe1c11462abb0d171 links it again (but as @pixelspark mentioned, inferencing happens on the CPU still).
@pixelspark Can you confirm that it does not work anymore?
Will test tonight (not sure if related but the latest master branch does not build on Linux either with GCC<8. That however seems to be an issue in GGML which is already being fixed: https://github.com/ggerganov/llama.cpp/issues/1279)
Which distribution is it?
Which distribution is it?
Just some old Ubuntu. I should be able to work around this using Docker
@pixelspark Can you confirm that it does not work anymore?
I checked out 0d8810058f51e1f9a6e575e0976d5fd00799f124
and that builds and works just fine on macOS...
@darxkies Wait a minute, I am actually getting this (after adding some basic code to construct a Metal context):
= note: Undefined symbols for architecture arm64:
"_ggml_metal_free", referenced from:
_$LT$ggml..metal..MetalContext$u20$as$u20$core..ops..drop..Drop$GT$::drop::hcbede1209645a6ad in libggml-f2423c67974c335d.rlib(ggml-f2423c67974c335d.ggml.301d3698-cgu.8.rcgu.o)
"_ggml_metal_init", referenced from:
ggml::context::Context::init::h91b873fed7f42fb9 in libggml-f2423c67974c335d.rlib(ggml-f2423c67974c335d.ggml.301d3698-cgu.0.rcgu.o)
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
My commit on top of yours: https://github.com/pixelspark/llm/tree/pedal-to-the-metal
Edit: the fix is trivial, see https://github.com/pixelspark/llm/commit/b647e5c16da2381e561516d56e81785cf4bb2d23
@LLukas22 Does it make sense to include that fix in my PR?
@darxkies Yeah include it, we probably first want to merge the build stuff and then add the actual implementations after that.
@pixelspark Thank you for the fix. I added it to my PR.
Would you say this is done with your PR, @pixelspark ?
Yes, obviously :-)
Though we still need to keep tracking ggml as GPU support in general and the Metal implementation is still in flux. @LLukas22 keeps an eye on this I presume.
Support for Metal GPU acceleration on macOS (and I assume iOS) just merged in llama.cpp master: https://github.com/ggerganov/llama.cpp/pull/1642
It would be great if this could also be employed from
llm
. I assume all it needs is something similar to https://github.com/rustformers/llm/pull/282/files and perhaps setting a flag at runtime (the-ngl 1
parameter should be set on llama.cpp's./main
to enable Metal).