llm-vscode is an extension for all things LLM. It uses llm-ls as its backend.
We also have extensions for:
Previously huggingface-vscode.
[!NOTE] When using the Inference API, you will probably encounter some limitations. Subscribe to the PRO plan to avoid getting rate limited in the free tier.
This plugin supports "ghost-text" code completion, à la Copilot.
Requests for code generation are made via an HTTP request.
You can use the Hugging Face Inference API or your own HTTP endpoint, provided it adheres to the APIs listed in backend.
The list of officially supported models is located in the config template section.
The prompt sent to the model will always be sized to fit within the context window, with the number of tokens determined using tokenizers.
Hit Cmd+shift+a
to check if the generated code is in The Stack.
This is a rapid first-pass attribution check using stack.dataportraits.org.
We check for sequences of at least 50 characters that match a Bloom filter.
This means false positives are possible and long enough surrounding context is necesssary (see the paper for details on n-gram striding and sequence length).
The dedicated Stack search tool is a full dataset index and can be used for a complete second pass.
Install like any other vscode extension.
By default, this extension uses bigcode/starcoder & Hugging Face Inference API for the inference.
You can supply your HF API token (hf.co/settings/token) with this command:
Cmd/Ctrl+Shift+P
to open VSCode command paletteLlm: Login
If you previously logged in with huggingface-cli login
on your system the extension will read the token from disk.
You can check the full list of configuration settings by opening your settings page (cmd+,
) and typing Llm
.
You can configure the backend to which requests will be sent. llm-vscode supports the following backends:
huggingface
: The Hugging Face Inference API (default)ollama
: Ollamaopenai
: any OpenAI compatible API (e.g. llama-cpp-python)tgi
: Text Generation InferenceLet's say your current code is this:
import numpy as np
import scipy as sp
{YOUR_CURSOR_POSITION}
def hello_world():
print("Hello world")
The request body will then look like:
const inputs = `{start token}import numpy as np\nimport scipy as sp\n{end token}def hello_world():\n print("Hello world"){middle token}`
const data = { inputs, ...configuration.requestBody };
const model = configuration.modelId;
let endpoint;
switch(configuration.backend) {
// cf URL construction
let endpoint = build_url(configuration);
}
const res = await fetch(endpoint, {
body: JSON.stringify(data),
headers,
method: "POST"
});
const json = await res.json() as { generated_text: string };
Note that the example above is a simplified version to explain what is happening under the hood.
The endpoint URL that is queried to fetch suggestions is build the following way:
{url}/v1/completions
for the openai
backend)huggingface
backend, it will automatically use the default URL
llm.disableUrlPathCompletion
You can tune the way the suggestions behave:
llm.enableAutoSuggest
lets you choose to enable or disable "suggest-as-you-type" suggestions.llm.documentFilter
lets you enable suggestions only on specific files that match the pattern matching syntax you will provide. The object must be of type DocumentFilter | DocumentFilter[]
:
llm.documentFilter: { pattern: "**" }
my_project/
: llm.documentFilter: { pattern: "/path/to/my_project/**" }
llm.documentFilter: { pattern: "**/*.{py,rs}" }
llm-vscode sets two keybindings:
Cmd+shift+l
by default, which corresponds to the editor.action.inlineSuggest.trigger
commandCmd+shift+a
by default, which corresponds to the llm.attribution
commandBy default, llm-ls is bundled with the extension. When developing locally or if you built your own binary because your platform is not supported, you can set the llm.lsp.binaryPath
setting to the path of the binary.
llm-ls uses tokenizers to make sure the prompt fits the context_window
.
To configure it, you have a few options:
No tokenization, llm-ls will count the number of characters instead:
{
"llm.tokenizer": null
}
from a local file on your disk:
{
"llm.tokenizer": {
"path": "/path/to/my/tokenizer.json"
}
}
from a Hugging Face repository, llm-ls will attempt to download tokenizer.json
at the root of the repository:
{
"llm.tokenizer": {
"repository": "myusername/myrepo",
"api_token": null,
}
}
Note: when api_token
is set to null, it will use the token you set with Llm: Login
command. If you want to use a different token, you can set it here.
from an HTTP endpoint, llm-ls will attempt to download a file via an HTTP GET request:
{
"llm.tokenizer": {
"url": "https://my-endpoint.example.com/mytokenizer.json",
"to": "/download/path/of/mytokenizer.json"
}
}
To test Code Llama 13B model:
cmd+,
) & type: Llm: Config Template
hf/codellama/CodeLlama-13b-hf
Read more here about Code LLama.
To test Phind/Phind-CodeLlama-34B-v2 and/or WizardLM/WizardCoder-Python-34B-V1.0 :
cmd+,
) & type: Llm: Config Template
hf/Phind/Phind-CodeLlama-34B-v2
or hf/WizardLM/WizardCoder-Python-34B-V1.0
Read more about Phind-CodeLlama-34B-v2 here and WizardCoder-15B-V1.0 here.
llm-ls
: git clone https://github.com/huggingface/llm-ls
llm-ls
: cd llm-ls && cargo build
(you can also use cargo build --release
for a release build)git clone https://github.com/huggingface/llm-vscode
cd llm-vscode && npm ci
Run and Debug
side bar & click Launch Extension
llm.lsp.binaryPath
setting to the path of the llm-ls
binary you built in step 2 (e.g. /path/to/llm-ls/target/debug/llm-ls
)F5
or like in 5.
Repository | Description |
---|---|
huggingface-vscode-endpoint-server | Custom code generation endpoint for this repository |
llm-vscode-inference-server | An endpoint server for efficiently serving quantized open-source LLMs for code. |