Support configurable inference params per prompt

synw commented 1 year ago

It would be nice to support per prompt params. For now only the temperature param is supported, and can not be adjusted for different prompts because the setting is global

Example use hypothetical case: I may want a more creative setup for prompts like the Explain one, and a more deterministic for code gen. For example I could set a higher temperature and some tail free sampling for Explain, and want a lower temperature and lower top_p for the Analyze one

Some params from the Llama.cpp server api that I would like to have support for:

interface InferParams {
  n_predict: number;
  top_k: number;
  top_p: number;
  temperature: number;
  frequency_penalty: number;
  presence_penalty: number;
  repeat_penalty: number;
  tfs_z: number;
  stop: Array<string>;
}

Ref: the Llama.cpp completion endpoint doc

It would also be nice to be able to have the model params per prompt for server tools that support multiple models at runtime. My use case: I have very small 3B and 7B models and I want to use one or the other depending on the prompt: I have very specific tasks tailored for a particular model with predefined inference params (example of the concept)

nvms commented 1 year ago

I pushed a version (1.3.17) just now that should support this, if you want to give it a try. Prompt templates now recognize a completionParams object, which will be passed to the endpoint, e.g.:

  {
    label: "Explain",
    description: "Explains the selected code.",
    userMessageTemplate:
      "Explain the following {{language}} code:\n```{{filetype}}\n{{text_selection}}\n```\n\nExplain as if you were explaining to another developer.\n\n{{input:What specifically do you need explained? Leave blank for general explaination.}}",
    completionParams: {
      temperature: 0.1,
      frequency_penalty: 1.1,
    },
  },

  {
    label: "Fix known bug",
    description: "Prompts for a description of the bug, and attempts to resolve it.",
    userMessageTemplate:
      "I have the following {{language}} code:\n```{{filetype}}\n{{text_selection}}\n```\n\nYour task is to find and fix a bug. Apart from fixing the bug, do not change the code.\n\nDescription of bug: {{input:Briefly describe the bug.}}\n\nIMPORTANT: Only return the code inside a code fence and nothing else. Do not explain your fix or changes in any way.",
    callbackType: "replace",
    completionParams: {
      temperature: 0.9,
    },
  },

Keeping in mind of course that providing unknown params to the official ChatGPT API will result in a 400:

nvms commented 1 year ago

For reference on how the request is ultimately formed (in transitive-bullshit/chatgpt-api):

https://github.com/transitive-bullshit/chatgpt-api/blob/main/src/chatgpt-api.ts#L184-L195

synw commented 1 year ago

Yes it works: the parameters are correctly sent :rocket:

Keeping in mind of course that providing unknown params to the official ChatGPT API will result in a 400

how about a Llama.cpp compatible api? For example the tail free sampling is not supported in the ChatGpt api. I have an example of such an api here, or see the demo server in Llama.cpp for more params

nvms commented 1 year ago

Maybe I'm misunderstanding, but you should be able to just put tfs_z in your completionParams and it will be sent. In fact, anything you put in there will be spread into the body of the JSON payload.

synw commented 1 year ago

I use two different api on my server: the Llama.cpp one and the OpenAi one that I made recently to use Wingman. They run on different endpoints (/v1/chat/completions for OpenAi and /completion for the Llama.cpp one). I would like to stick to these official api specs to avoid confusion for the users. If I start to add things to the OpenAi api it would introduce confusion I think, so it would be better to have a way to support the Llama.cpp api

[Edit]: maybe I can help with the code as I already have this api implemented in frontend if you wish to go this way

nvms commented 1 year ago

Oh I see, yeah if you can think of a good way to handle this you are welcomed to open up a PR - you may have a better idea of how to implement this than I do

synw commented 1 year ago

I'll wait until you will have replaced the chatgpt-api with your own code: then it should be easy to adapt the params and the endpoint name, with maybe a setting to select the Llama.cpp api

synw commented 1 year ago

Closing this as with the new providers + completionParams we have the feature

nvms / wingman

Support configurable inference params per prompt #16