Running llama-cpp-python OpenAI compatible server

abasu0713 commented 4 months ago

Requesting a little help here. Trying to test out copilot functionality with llama-cpp-python with this extension. Below is my configuration setting.

{
    "[python]": {
        "editor.formatOnType": true
    },
    "cmake.configureOnOpen": true,
    "llm.backend": "openai",
    "llm.configTemplate": "Custom",
    "llm.url": "http://192.X.X.X:12080/v1/chat/completions",
    "llm.fillInTheMiddle.enabled": false,
    "llm.fillInTheMiddle.prefix": "<PRE> ",
    "llm.fillInTheMiddle.middle": " <MID>",
    "llm.fillInTheMiddle.suffix": " <SUF>",
    "llm.requestBody": {
        "parameters": {
            "max_tokens": 60,
            "temperature": 0.2,
            "top_p": 0.95
        }
    },
    "llm.contextWindow": 4096,
    "llm.tokensToClear": [
        "<EOS>"
    ],
    "llm.tokenizer": null,
    "llm.tlsSkipVerifyInsecure": true,
    "llm.modelId": "",
}

I am seeing there is inference going on the server:

So I am not entirely sure what I am missing. Additionally I am trying to see the extension logs.. for the worker calls. But I don't see anything. Would you be able to give any guidance or some step by step explanation on how this can be done.

Thank you so much

zikeji commented 3 months ago

Not sure if it's the same, but I'm using koboldcpp - perhaps try using v1/completions, not v1/chat/completions?

McPatate commented 3 months ago

Hi, it's indeed the /v1/completions endpoint and not /v1/chat/completions. Also, you shouldn't need to add the path anymore, you can set llm.url as http[s]://{hostname}

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 30 days with no activity.

abasu0713 commented 2 months ago

Hi, it's indeed the /v1/completions endpoint and not /v1/chat/completions. Also, you shouldn't need to add the path anymore, you can set llm.url as http[s]://{hostname}

I am going to give this a try tomorrow and report back. And sorry I didn't get back sooner. I just saw a note from github notification. Thank you for the reply. I was using /v1/chat/completions since I was using the llama-insruct models. Which require that endpoint no?

McPatate commented 2 months ago

Which require that endpoint no?

They might yes, the extension doesn't support chat models atm. The model you use must be compatible with code completion, either with fill in the middle or not (but I strongly advise to use FIM as it generates more relevant completions).

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity.

huggingface / llm-vscode

Running llama-cpp-python OpenAI compatible server #140