nvms / wingman

Your pair programming wingman. Supports OpenAI, Anthropic, or any LLM on your local inference server.
https://marketplace.visualstudio.com/items?itemName=nvms.ai-wingman
ISC License
61 stars 10 forks source link

Context window and max_tokens management #17

Open synw opened 1 year ago

synw commented 1 year ago

Running the "Write unit tests" command with a local Llama 2 model I get an error message because of the default 1000 max_tokens param:

llama_predict: error: prompt is too long (1133 tokens, max 1020)

I would like to be able to set the context window size of the model (Llama 2 is 4096 tokens). This way the max_tokensparam value could be automatically calculated using the llama-tokenizer-js lib:

import llamaTokenizer from 'llama-tokenizer-js';

const promptNtokens = llamaTokenizer.encode(prompt).length;
const maxTokens = model_context_window_size_param - promptNTokens

[Edit]: it would need another tokenizer for the OpenAi, this one is for local models