jafioti / luminal

Deep learning at the speed of light.
https://luminalai.com
Apache License 2.0
1.45k stars 90 forks source link

Add llama server #52

Closed TheSeamau5 closed 4 months ago

TheSeamau5 commented 5 months ago

Initial version of a llama-based server. Currently only supports llama 3 8B-Instruct, currently restricted set of OpenAI API

POST localhost:3000/chat/completions

{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
}

Responds with

{
    "id": "chatcmpl-48aaeb15-86a7-41fd-a3aa-547603b63293",
    "object": "chat.completion",
    "created": 1714411725,
    "model": "meta-llama/Meta-Llama-3-70B-Instruct",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?<|eot_id|>"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 23,
        "completion_tokens": 25,
        "total_tokens": 48
    }
}

To run (on mac):

cd examples/llama_server
bash ./setup/setup.sh
cargo run --release --features metal