Open chris-sanders opened 7 months ago
It seems that there is an issue with connecting to the gRPC service after loading the Mixtral model. The error message indicates a connection refusal when trying to reach the GRPC service at 127.0.0.1:46605
.
To troubleshoot this issue, you can try the following steps:
Verify that the backend process is running properly. You can do this by checking the output of the backend command:
ps aux | grep exllama2
ps aux | grep exllama
Ensure that the firewall is not blocking the GRPC port (46605
in this case). You may need to open the port in the firewall settings or add an exception.
Check if there are any other instances of the backend process running, as this could cause a conflict. You can do this by checking the process list using the ps
command and looking for any duplicate processes.
Make sure that there is no network issue preventing the connection to the GRPC service. Check the network connectivity between the host and 127.0.0.1:46605
.
Try restarting the backend process and see if the issue persists. You can do this by stopping the current process and starting a new one, for example:
kill -9 <process_id> ; exllama2
kill -9 <process_id> ; exllama
If the issue still persists after trying these steps, you may need to look into specific configuration settings or seek further assistance from the support channels for the Mixtral model or the backend you are using (exllama2, exllama, etc.).
I also wanted to give exl2 a shot, the model is loading & grpc server seems fine, but i get this error on inference:
Error rpc error: code = Unknown desc = Exception iterating responses: 'Result' object is not an iterator
LocalAI version: v2.12.1
Environment, CPU architecture, OS, and Version: Linux chrispc 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu Oct 5 21:02:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
This is ubuntu 22.04 on WSL2 with Nvidia drivers available in the VM.
Describe the bug Using exllama2 directly just by cloning the repository and installing as per it's github I'm able to use an exl2 model example:
Using the same model with exllama2 through LocalAI I get
See the logs below in the log section during this time.
To Reproduce
I adjusted the file
./backend/python/exllama2/install.sh
to use the master branch of exllama2 just in case.I'm building with
sudo docker build --build-arg="BUILD_TYPE=cublas" --build-arg="CUDA_MAJOR_VERSION=12" --build-arg="CUDA_MINOR_VERSION=4" -t localai .
Then running this docker-compose:
The Mixtral is from:
https://huggingface.co/turboderp/Mixtral-8x7B-instruct-exl2
I've tried3.5bpw
and3.0bpw
this particular run is 3.0 and both work fine when using the built in example from exllama2 and both fail in this same way when using LocalAI.The file mixtral.yaml in the /models folder is:
Logs
Additional context