Closed synw closed 1 year ago
I pushed a version (1.3.17
) just now that should support this, if you want to give it a try. Prompt templates now recognize a completionParams
object, which will be passed to the endpoint, e.g.:
{
label: "Explain",
description: "Explains the selected code.",
userMessageTemplate:
"Explain the following {{language}} code:\n```{{filetype}}\n{{text_selection}}\n```\n\nExplain as if you were explaining to another developer.\n\n{{input:What specifically do you need explained? Leave blank for general explaination.}}",
completionParams: {
temperature: 0.1,
frequency_penalty: 1.1,
},
},
{
label: "Fix known bug",
description: "Prompts for a description of the bug, and attempts to resolve it.",
userMessageTemplate:
"I have the following {{language}} code:\n```{{filetype}}\n{{text_selection}}\n```\n\nYour task is to find and fix a bug. Apart from fixing the bug, do not change the code.\n\nDescription of bug: {{input:Briefly describe the bug.}}\n\nIMPORTANT: Only return the code inside a code fence and nothing else. Do not explain your fix or changes in any way.",
callbackType: "replace",
completionParams: {
temperature: 0.9,
},
},
Keeping in mind of course that providing unknown params to the official ChatGPT API will result in a 400:
For reference on how the request is ultimately formed (in transitive-bullshit/chatgpt-api
):
https://github.com/transitive-bullshit/chatgpt-api/blob/main/src/chatgpt-api.ts#L184-L195
Yes it works: the parameters are correctly sent :rocket:
Keeping in mind of course that providing unknown params to the official ChatGPT API will result in a 400
how about a Llama.cpp compatible api? For example the tail free sampling is not supported in the ChatGpt api. I have an example of such an api here, or see the demo server in Llama.cpp for more params
Maybe I'm misunderstanding, but you should be able to just put tfs_z
in your completionParams
and it will be sent. In fact, anything you put in there will be spread into the body of the JSON payload.
I use two different api on my server: the Llama.cpp one and the OpenAi one that I made recently to use Wingman. They run on different endpoints (/v1/chat/completions
for OpenAi and /completion
for the Llama.cpp one). I would like to stick to these official api specs to avoid confusion for the users. If I start to add things to the OpenAi api it would introduce confusion I think, so it would be better to have a way to support the Llama.cpp api
[Edit]: maybe I can help with the code as I already have this api implemented in frontend if you wish to go this way
Oh I see, yeah if you can think of a good way to handle this you are welcomed to open up a PR - you may have a better idea of how to implement this than I do
I'll wait until you will have replaced the chatgpt-api with your own code: then it should be easy to adapt the params and the endpoint name, with maybe a setting to select the Llama.cpp api
Closing this as with the new providers + completionParams we have the feature
It would be nice to support per prompt params. For now only the temperature param is supported, and can not be adjusted for different prompts because the setting is global
Example use hypothetical case: I may want a more creative setup for prompts like the Explain one, and a more deterministic for code gen. For example I could set a higher temperature and some tail free sampling for Explain, and want a lower temperature and lower top_p for the Analyze one
Some params from the Llama.cpp server api that I would like to have support for:
Ref: the Llama.cpp completion endpoint doc
It would also be nice to be able to have the
model
params per prompt for server tools that support multiple models at runtime. My use case: I have very small 3B and 7B models and I want to use one or the other depending on the prompt: I have very specific tasks tailored for a particular model with predefined inference params (example of the concept)