mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.14k stars 1.57k forks source link

[Question] If I can run mlc_llm on an arm64 cpu without any gpu devices? #2927

Open AIarong opened 1 month ago

AIarong commented 1 month ago

❓ General Questions

I have surely installed tvm in my device which has an arm64 on it and I want to run mlc_llm on my device to do model inference. But when I installed mlc_llm on my device like https://llm.mlc.ai/docs/install/mlc_llm.html shows. But when I run the example

from mlc_llm import MLCEngine

Create engine

model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC" engine = MLCEngine(model)

Run chat completion in OpenAI API.

for response in engine.chat.completions.create( messages=[{"role": "user", "content": "What is the meaning of life?"}], model=model, stream=True, ): for choice in response.choices: print(choice.delta.content, end="", flush=True) print("\n")

engine.terminate()

in the above website, it shows me the devices not found like cuda, vulkan, metal, etc. So I want to know if I can run mlc_llm on my device which only has cpu without any gpus?

smpurkis commented 1 month ago

I would also like to know if this is possible, as I have a 4 core arm64 cpu server I would like to run this on. Potentially adding this target to the prebuilts would be very helpful.

MasterJH5574 commented 1 month ago

Hey folks, thanks for the question. As of now the MLC optimization pipeline is specialized for GPU backends and we require a GPU to run models. So right now unfortunately we are not able to run model only with the CPU.

AIarong commented 1 month ago

Hey folks, thanks for the question. As of now the MLC optimization pipeline is specialized for GPU backends and we require a GPU to run models. So right now unfortunately we are not able to run model only with the CPU.

Thanks for your answer. And I also want to know how to run LLM on tvm with cpu devices? I have run some gguf Modelfile on HF, if I can only transformer the gguf Modelfile to other Modelfile to run LLM on tvm?

sunzj commented 2 weeks ago

Hey folks, thanks for the question. As of now the MLC optimization pipeline is specialized for GPU backends and we require a GPU to run models. So right now unfortunately we are not able to run model only with the CPU.

Thanks for your answer. And I also want to know how to run LLM on tvm with cpu devices? I have run some gguf Modelfile on HF, if I can only transformer the gguf Modelfile to other Modelfile to run LLM on tvm? you may check https://github.com/microsoft/T-MAC , it's backend is TVM@CPU, then export the operatoer to llama.cpp.