[p2p, cuda, compatibility] Compatibility issue between gpu and cpu instances in attempt to build p2p network

LocalAI version:

local-aio-gpu-nvidia-cuda-12 local-aio-cpu

Environment, CPU architecture, OS, and Version:

Describe the bug

I am trying to build p2p network, and it's actually working, peers can discover each other and exchange tasks (https://localai.io/features/distribute/)

There is a problem if you try to connect aio-cpu images to aio-gpu images and vice versa. It's looked like we can only have CPU networks and GPU networks, because if you launch local-ai-gpu as host and aio-cpu as worker, then the host will try to assemble backend on worker side and catch CUDA errors, because worker device have no gpu.

The same thing possibly work backwards, meaning if you have aio-cpu as a host and aio-gpu as a worker then GPU instance will get backend with cpu-only output.

4:10PM DBG GRPC(code-13b.Q5_K_M.gguf-127.0.0.1:39375): stderr GGML_ASSERT: /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
4:10PM ERR Server error error="rpc error: code = Unavailable desc = error reading from server: EOF" ip=172.20.0.2 latency=17.191049654s method=POST status=500 url=/chat/completions

To Reproduce

Start local-ai-gpu-nvidia-cuda-12 as host node and local-ai-cpu as worker node

Expected behavior

Host side would understand that it's cpu instance and will not try to build gpu backend

Logs

Host side:
:10PM DBG GRPC(code-13b.Q5_K_M.gguf-127.0.0.1:39375): stderr ggml_cuda_compute_forward: RMS_NORM failed
4:10PM DBG GRPC(code-13b.Q5_K_M.gguf-127.0.0.1:39375): stderr CUDA error: the provided PTX was compiled with an unsupported toolchain.
4:10PM DBG GRPC(code-13b.Q5_K_M.gguf-127.0.0.1:39375): stderr   current device: 0, in function ggml_cuda_compute_forward at /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-cuda.cu:2283
4:10PM DBG GRPC(code-13b.Q5_K_M.gguf-127.0.0.1:39375): stderr   err
4:10PM DBG GRPC(code-13b.Q5_K_M.gguf-127.0.0.1:39375): stderr GGML_ASSERT: /build/backend/cpp/llama-avx2/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
4:10PM ERR Server error error="rpc error: code = Unavailable desc = error reading from server: EOF" ip=172.20.0.2 latency=17.191049654s method=POST status=500 url=/chat/completions
4:10PM DBG Searching for workers

Worker side:
Client connection closed

Additional context

mudler / LocalAI

[p2p, cuda, compatibility] Compatibility issue between gpu and cpu instances in attempt to build p2p network #2735