OpenAI backend still creates HuggingFace-formatted request

rggs commented 9 months ago

I have llama.cpp running locally. Here's the relevant part of my settings.json:

    "llm.configTemplate": "Custom",
    "llm.fillInTheMiddle.enabled": true,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOT>"
    ],
    "llm.tokenizer": {
        "repository": "codellama/CodeLlama-13b-hf"
    },
    "llm.lsp.logLevel": "warn",
    "llm.backend": "openai",
    "llm.modelId": "CodeLlama70b",
    "llm.url": "http://localhost:8080/v1/chat/completions",

However, looking at the request, it's still formatted as a HuggingFace request:

{"timestamp":1707926680,"level":"INFO","function":"log_server_request","line":2603,"message":"request","remote_addr":"127.0.0.1","remote_port":54364,"status":500,"method":"POST","path":"/v1/chat/completions","params":{}}
{"timestamp":1707926680,"level":"VERBOSE","function":"log_server_request","line":2608,"message":"request","request":"{\"model\":\"CodeLlama70b\",\"parameters\":{\"max_new_tokens\":60,\"temperature\":0.2,\"top_p\":0.95},\"prompt\":\"<PRE> import math\\n\\n# Here's a function that adds two numbers: <SUF> <MID>\",\"stream\":false}","response":"500 Internal Server Error\n[json.exception.type_error.302] type must be array, but is number"}

McPatate commented 9 months ago

It works with the /v1/completions API, not sure it does with other endpoints

github-actions[bot] commented 8 months ago

This issue is stale because it has been open for 30 days with no activity.

McPatate commented 7 months ago

Closing for now, feel free to open an another issue if you're still having difficulties making it work.

huggingface / llm-vscode

OpenAI backend still creates HuggingFace-formatted request #131