Add llama server - Githubissues

Initial version of a llama-based server. Currently only supports llama 3 8B-Instruct, currently restricted set of OpenAI API

POST localhost:3000/chat/completions

{
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
}

Responds with

{
    "id": "chatcmpl-48aaeb15-86a7-41fd-a3aa-547603b63293",
    "object": "chat.completion",
    "created": 1714411725,
    "model": "meta-llama/Meta-Llama-3-70B-Instruct",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?<|eot_id|>"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 23,
        "completion_tokens": 25,
        "total_tokens": 48
    }
}

To run (on mac):

cd examples/llama_server
bash ./setup/setup.sh
cargo run --release --features metal

jafioti / luminal

Add llama server #52