mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
25.87k stars 1.94k forks source link

`temperature` parameter is not respected #676

Open samm81 opened 1 year ago

samm81 commented 1 year ago

LocalAI version: cc31c58235496ad7b703b55096172efb1e37feb8

Environment, CPU architecture, OS, and Version: Linux 6.1.31_1 #1 SMP PREEMPT_DYNAMIC Wed May 31 05:53:37 UTC 2023 x86_64 GNU/Linux

Describe the bug the temperature parameter seems to be being ignored, as if it's always set to 0. the same output is always returned - and when I run the model using llama.cpp directly with a temp param of 0 I get the same text as what's returned by the /completions endpoint. I'm using a custom model which wasn't trained for chatting, so I'm not sure if this is an issue only for the /completions endpoint, or for all endpoints.

To Reproduce

❯ curl http://localhost:57541/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-model-q4_0.bin",
     "prompt": "",
     "max_tokens": 16,
     "temperature": 0.7
   }'
{"object":"text_completion","model":"pesterbot_5mins-2023_05_19/ggml-model-q4_0.bin","choices":[{"text":" 😂 | I'm so tired of this | I've"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}%
❯ curl http://localhost:57541/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-model-q4_0.bin",
     "prompt": "",
     "max_tokens": 16,
     "temperature": 0.7
   }'
{"object":"text_completion","model":"pesterbot_5mins-2023_05_19/ggml-model-q4_0.bin","choices":[{"text":" 😂 | I'm so tired of this | I've"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}%
❯ curl http://localhost:57541/v1/completions -H "Content-Type: application/json" -d '{
     "model": "ggml-model-q4_0.bin",
     "prompt": "",
     "max_tokens": 16,
     "temperature": 0.7
   }'
{"object":"text_completion","model":"pesterbot_5mins-2023_05_19/ggml-model-q4_0.bin","choices":[{"text":" 😂 | I'm so tired of this | I've"}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}%

Expected behavior receive different outputs if the temp is non-zero.

Logs

Additional context

samm81 commented 1 year ago

looking at the logs, the temperature param gets correctly parsed into the config:

11:20PM DBG Parameter Config: &{OpenAIRequest:{Model:pesterbot_5mins-2023_05_19/ggml-model-q4_0.bin File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:1.2 Maxtokens:32 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}

Temperature:1.2

bnusunny commented 1 year ago

I have the same experience.

localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

Based on the information provided, it seems like the issue is with the /completions endpoint not respecting the temperature parameter in the request. This could be a bug in the LocalAI code or a misconfiguration of the model. To investigate further, you can try the following steps:

  1. Check if the temperature parameter is being correctly passed to the model by printing it out in the model's code. You can do this by adding a print statement in the predict function of your model.
  2. Check if the model is correctly trained for text completion by verifying that the expected output is returned when you run the model with a different set of parameters, such as a different prompt or max tokens.
  3. Check if the issue is specific to the /completions endpoint by testing other endpoints in LocalAI, such as /text_classification or /image_classification.
  4. Check if the issue is specific to your custom model by training a pre-existing model from the LocalAI examples and testing it with the same parameters. If none of these steps resolve the issue, you can open an issue on the LocalAI GitHub repository with the details of the problem and any logs or code snippets that may be helpful in debugging the issue further.

Sources: