Open titogrima opened 2 months ago
Hey @titogrima - LocalAI doesn't set any thread when running in p2p
mode. This sounds more like a bug in llama.cpp
as we just run the vanilla rpc service from the llama.cpp project, did you check if there are bugs relative to that upstream?
Hi!
I checked llama.cpp repo https://github.com/ggerganov/llama.cpp but don't see any issue with this problem but if LocalAI don't set any thread in p2p mode maybe is better open issue in llama.cpp repo I'm going to investigate this issue further but it's helpful to know that LocalAI doesn't set threads in p2p mode, maybe I can set threads directly in llama.cpp
Thanks and sorry for my english XD!!
Also might be worth noting that you can pass any command options of llama.cpp from LocalAI with --llama-cpp-args
or LOCALAI_EXTRA_LLAMA_CPP_ARGS
, from the --help
output:
./local-ai worker p2p-llama-cpp-rpc --help
Usage: local-ai worker p2p-llama-cpp-rpc [flags]
Starts a LocalAI llama.cpp worker in P2P mode (requires a token)
Flags:
-h, --help Show context-sensitive help.
--log-level=LOG-LEVEL Set the level of logs to output [error,warn,info,debug,trace]
($LOCALAI_LOG_LEVEL)
--token=STRING P2P token to use ($LOCALAI_TOKEN, $LOCALAI_P2P_TOKEN, $TOKEN)
--no-runner Do not start the llama-cpp-rpc-server ($LOCALAI_NO_RUNNER, $NO_RUNNER)
--runner-address=STRING Address of the llama-cpp-rpc-server ($LOCALAI_RUNNER_ADDRESS,
$RUNNER_ADDRESS)
--runner-port=STRING Port of the llama-cpp-rpc-server ($LOCALAI_RUNNER_PORT, $RUNNER_PORT)
--llama-cpp-args=STRING Extra arguments to pass to llama-cpp-rpc-server
($LOCALAI_EXTRA_LLAMA_CPP_ARGS, $EXTRA_LLAMA_CPP_ARGS)
Hi!
I tried this LOCALAI_EXTRA_LLAMA_CPP_ARGS=--threads 7 https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md#number-of-threads But llama-cpp-rpc-server only support this arguments
11:12AM INF Starting llama-cpp-rpc-server on '127.0.0.1:35291' error: unknown argument: --threads 7 Usage: /tmp/localai/backend_data/backend-assets/util/llama-cpp-rpc-server [options]
options: -h, --help show this help message and exit -H HOST, --host HOST host to bind to (default: 127.0.0.1) -p PORT, --port PORT port to bind to (default: 35291) -m MEM, --mem MEM backend memory size (in MB)
And fail to boot with threads option
Well
Reading code of llama.cpp the "problem" is that in rcp-server code https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/rpc-server.cpp When initialized cpu backend line 87 call ggml_backend_cpu_init() function in gglm code https://github.com/ggerganov/llama.cpp/blob/20f1789dfb4e535d64ba2f523c64929e7891f428/ggml/src/ggml-backend.c#L869 line 869 and this function have a GGML_DEFAULT_N_THREADS variable thats is 4 in headers gglm file https://github.com/ggerganov/llama.cpp/blob/20f1789dfb4e535d64ba2f523c64929e7891f428/ggml/include/ggml.h#L236 line 236 Maybe I can recompiling it with GGML_DEFAULT_N_THREADS change or similar
Thanks for your help!!
I recompiled ggml with /build/backend/cpp/llama/llama.cpp/ggml/include/ggml.h variable GGML_DEFAULT_N_THREADS change and works Obviously is not the best solution but works....
Regards!
LocalAI version: v2.20.1-ffmpeg-core docker image for two workers and latest-aio-cpu for master
Environment, CPU architecture, OS, and Version: Cluster P2P lab in docker machine with heteregeneus CPU AMD64 and ARM Linux clusteria1 6.6.45-0-virt #1-Alpine SMP PREEMPT_DYNAMIC 2024-08-13 08:10:32 aarch64 Linux 8 CPU 7 GB RAM run one worker Linux ia 6.1.0-23-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux 12 CPU 10 GB RAM run master and one worker
Describe the bug When use P2P workers mode work fine but always use only 4 CPU to inference in each node, I try env file with LOCALAI_THREADS=12 and --threads 12 in 12 CPU node and LOCALAI_THREADS=7 and --threads 7 in 8 CPU node Else try THREADS variable in env file If only run a master without workers work without any problem THREADS variable
To Reproduce Launch a P2P worker cluster and set threads distinct from 4 threads
Expected behavior Node use the threads defined
Logs Logs from one worker
create_backend: using CPU backend Starting RPC server on 127.0.0.1:37885, backend memory: 9936 MB ^C@@@@@ Skipping rebuild @@@@@ If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed: CMAKE_ARGS="-DGGML_F16C=OFF -DGGML_AVX512=OFF -DGGML_AVX2=OFF -DGGML_FMA=OFF" see the documentation at: https://localai.io/basics/build/index.html Note: See also https://github.com/go-skynet/LocalAI/issues/288 @@@@@ CPU info: model name : AMD Ryzen 9 5900X 12-Core Processor flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid fsrm arch_capabilities CPU: AVX found OK CPU: AVX2 found OK CPU: no AVX512 found @@@@@ 9:22PM INF env file found, loading environment variables from file envFile=.env 9:22PM DBG Setting logging to debug 9:22PM DBG Extracting backend assets files to /tmp/localai/backend_data {"level":"INFO","time":"2024-08-26T21:22:10.977Z","caller":"config/config.go:288","message":"connmanager disabled\n"} {"level":"INFO","time":"2024-08-26T21:22:10.977Z","caller":"config/config.go:292","message":" go-libp2p resource manager protection disabled"} 9:22PM INF Starting llama-cpp-rpc-server on '127.0.0.1:34015' {"level":"INFO","time":"2024-08-26T21:22:10.978Z","caller":"node/node.go:118","message":" Starting EdgeVPN network"} create_backend: using CPU backend Starting RPC server on 127.0.0.1:34015, backend memory: 9936 MB 2024/08/26 21:22:10 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 7168 kiB, got: 416 kiB). See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details. {"level":"INFO","time":"2024-08-26T21:22:10.987Z","caller":"node/node.go:172","message":" Node ID: 12D3KooWFvq7aNHpre5tyQDZN9Gn2tZh84E3Vf9tfBuCmB5ULJSB"} {"level":"INFO","time":"2024-08-26T21:22:10.987Z","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/41065 /ip4/127.0.0.1/udp/43346/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip4/127.0.0.1/udp/47629/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip4/127.0.0.1/udp/59911/quic-v1 /ip4/192.168.XX.XX/tcp/41065 /ip4/192.168.XX.XX/udp/43346/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip4/192.168.XX.XX/udp/47629/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip4/192.168.XX.XX/udp/59911/quic-v1 /ip6/::1/tcp/33785 /ip6/::1/udp/46892/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip6/::1/udp/49565/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip6/::1/udp/59078/quic-v1 /ip6/fda7:761c:127e:4::26/tcp/33785 /ip6/fda7:761c:127e:4::26/udp/46892/webrtc-direct/certhash/uEiDbmMPnLfeQJBvFRcfp-zDNXx-_CjljBg0ia3Nr20Xs7g /ip6/fda7:761c:XXXX:XX::XX/udp/49565/quic-v1/webtransport/certhash/uEiA46crpiIhxfL7skSKai7WxlGHkv8mZNXzAYoogm_qhow/certhash/uEiCt91_kaygLCTKWpqX6PEOTzb617BIH7KHDTRrw_eyurw /ip6/fda7:761c:XXX:XX::XX/udp/59078/quic-v1]"} {"level":"INFO","time":"2024-08-26T21:22:10.987Z","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"} Accepted client connection, free_mem=10418868224, total_mem=10418868224 Client connection closed Accepted client connection, free_mem=10418868224, total_mem=10418868224 Client connection closed Accepted client connection, free_mem=10418868224, total_mem=10418868224 Client connection closed
Additional context