Installation and Setup on MACOS M1 Arm fails

faddy19 commented 6 months ago

Hi, I downloaded the model and followed your instructions. I get the following error

7b-v0.1 PORT=8080 MODEL_PATH=models--xingyaoww--CodeActAgent-Mistral-7b-v0.1 MODEL_DIR=. CUDA_VISIBLE_DEVICES= docker: Error response from daemon: unknown or invalid runtime name: nvidia. See 'docker run --help'.

I figured out that MacOS is not supported by Nvidia Toolkit. Any solution for that specific problem?

Thanks a lot

faddy19 commented 6 months ago

I have Docker Desktop installed on my Mac. What am I missing here?

sweetcard commented 6 months ago

vllm is not supported by mac.

faddy19 commented 6 months ago

So OS has to be Linux right? Thank you

sweetcard commented 6 months ago

So OS has to be Linux right? Thank you

linux with Nvidia card and CUDA is better

faddy19 commented 6 months ago

Appreciate your fast reply. Last question do you run the demo on aws ec2? Can you let me know the configuration so I can deloy it? Thank you

sweetcard commented 6 months ago

Appreciate your fast reply. Last question do you run the demo on aws ec2? Can you let me know the configuration so I can deloy it? Thank you

Have a try with aws g5.2xlarge

sweetcard commented 6 months ago

you can also try to run the model with llama.cpp with openai Api support in macos

faddy19 commented 6 months ago

I need it local on MacOS but seems very heavy for that. Do you know a way to do so without breaking the limits? What would you suggest? Very painful and expensive going through AWS [g5.2xlarge] to test the capabilities and replicate the results. Would be great to discuss options

sweetcard commented 6 months ago

I need it local on MacOS but seems very heavy for that. Do you know a way to do so without breaking the limits? What would you suggest? Very painful and expensive going through AWS [g5.2xlarge] to test the capabilities and replicate the results. Would be great to discuss options

you can try llama.cpp python binding with openai API support. it’s not difficult.

I’ll try to submit PR for macos later.

faddy19 commented 6 months ago

You are a great help man. Developer power. If you can drop a MacOS documentation as I and a lot of people are not familiar with it that would be awesome. I can give you instant feedback on that today. Would love to test it out and loop back

faddy19 commented 6 months ago

Also do you know a way to deploy it but not run out of cash as fast as aws g5.2xlarge ? Can I use something else?

So what exactly do I have to modify in Codeact scripts and bash commands to use llama cpp python?

https://github.com/abetlen/llama-cpp-python

Thanks a lot

faddy19 commented 6 months ago

I need it local on MacOS but seems very heavy for that. Do you know a way to do so without breaking the limits? What would you suggest? Very painful and expensive going through AWS [g5.2xlarge] to test the capabilities and replicate the results. Would be great to discuss options

you can try llama.cpp python binding with openai API support. it’s not difficult.

I’ll try to submit PR for macos later.

Can you provide the documentation for MacOS today? That would be awesome The idea seems to be very effective. You guys did a great job

sweetcard commented 6 months ago

Also do you know a way to deploy it but not run out of cash as fast as aws g5.2xlarge ? Can I use something else?

So what exactly do I have to modify in Codeact scripts and bash commands to use llama cpp python?

https://github.com/abetlen/llama-cpp-python

Thanks a lot

install llama.cpp

prepare gguf format with llama.cpp: check this link

start web server with the gguf model: check this link

any other steps.

faddy19 commented 6 months ago

Will try right now and let you know if it works

Thank you

faddy19 commented 6 months ago

I tried it but it is too complex for me. I need a step by step, command by command instructions. Would be great to get some help from you here. I think most of the people will never be able to do this on their own. Lets share it so the community can faster test and try it. Thanks a lot for your help

faddy19 commented 6 months ago

Are you able to give a step by step or even a short screen recording and paste it into the readme github?

sweetcard commented 6 months ago

Are you able to give a step by step or even a short screen recording and paste it into the readme github?

I will try to add gguf model and scripts later.

faddy19 commented 6 months ago

I have issues installing the nvidia cuda part. Can you tell what exactly you need to set up on linux to get it running?

sweetcard commented 6 months ago

I have issues installing the nvidia cuda part. Can you tell what exactly you need to set up on linux to get it running?

show more details about issues

faddy19 commented 6 months ago

The issue is that cuda and nvidia toolkit is not preinstalled on the ec2 instance. It is very painful to install them manually.

What is the easiest way to get the model running? On which cloud or instance with which specific configuration. It would be very helpful to get a documentation about that otherwise not a lot of people can reproduce results or optimize or contribute to the project.

Appreciate your help.

faddy19 commented 6 months ago

If you can leave a nice reproducible documentation step by step where folks can go through that way you will receive a lot of feedback for the model and the work.

Did you try the deployment over hugging face? Would that be an easy alternative?

xingyaoww commented 5 months ago

Hey @faddy19, thanks for your interest in our work! Sorry for the late reply; for some reason, I did not receive email notification for this issue.

The goal of using vLLM is to serve an OpenAI compatible API on your localhost -- If that's not feasible on MacOS, you can consider using ollama that recently has support regarding local hosting (check this). This will allow you to run the model directly on your Mac without relying on AWS EC2 instance.

If you still want to get EC2 working, you can refer to this guide to install nvidia driver & cuda, and then this guide to install nvidia-docker to be able to run vLLM in docker.

Once you setup the vLLM, you can proceed with the rest of the README that using Docker.

faddy19 commented 5 months ago

That's great. I will try a local one today and provide feedback. This is very helpful. Thanks

xingyaoww commented 5 months ago

Hi @sweetcard, thanks for suggesting using llama.cpp.

Hi @faddy19 , I got llama.cpp working on my MacOS laptop (M2 Max) and wrote instructions here. We actually don't need ollama since llama.cpp can already serve models into OpenAI-compatible APIs.

Please let me know if it resolves your issue!

xingyaoww commented 4 months ago

Close due to inactivity - feel free to re-open if issue arise!

xingyaoww / code-act

Installation and Setup on MACOS M1 Arm fails #2