Closed Noooste closed 1 year ago
Hi, @Noooste. Thanks for your feedback. I believe this is not the bug of localAI. I have the same experiences as you while I am using copilot. Here are some potential reasons:
And it always occurred with short sentences. For example, "Prefix sum technique is", and it will show me lots of repeat sentences which start with "Prefix sum."
And I will close this issue. If you still hit this kind issue, please reopen it anytime.
LocalAI version: latest version
Environment, CPU architecture, OS, and Version: OS: Debian GNU/Linux 11 (bullseye) CPU architecture : x86_64
Linux euw1 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
Describe the bug
The answer is repeated.
To Reproduce first apply model with
from this py code
Expected behavior
No repetitions
Logs
Logs
~/LocalAI# ./local-ai --threads 8 --address localhost:8080 --debug 10:02PM DBG no galleries to load 10:02PM INF Starting LocalAI using 8 threads, with models path: /root/LocalAI/models 10:02PM INF LocalAI version: v1.24.1-38-g9e5fb29 (9e5fb2996582bc45e5a9cbe6f8668e7f1557c15a) 10:02PM DBG Model: mpt-7b-chat (config: {PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:mpt-7b-chat F16:true Threads:0 Debug:false Roles:map[assistant:Assistant: system:System: user:User:] Embeddings:false Backend:gpt4all-mpt TemplateConfig:{Chat:mpt-chat ChatMessage: Completion:mpt-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}}) 10:02PM DBG Extracting backend assets files to /tmp/localai/backend_data ┌───────────────────────────────────────────────────┐ │ Fiber v2.48.0 │ │ http://127.0.0.1:8080 │ │ │ │ Handlers ............ 59 Processes ........... 1 │ │ Prefork ....... Disabled PID ........... 2925529 │ └───────────────────────────────────────────────────┘ 10:02PM DBG Request received: 10:02PM DBG Configuration read: &{PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.1 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}} 10:02PM DBG Parameters: &{PredictionOptions:{Model:ggml-mpt-7b-chat.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.1 Maxtokens:512 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name: F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}} 10:02PM DBG Prompt (before templating): how are you ? 10:02PM DBG Template failed loading: failed loading a template for ggml-mpt-7b-chat.bin 10:02PM DBG Prompt (after templating): how are you ? 10:02PM DBG Loading model 'ggml-mpt-7b-chat.bin' greedly from all the available backends: llama, llama-stable, gpt4all, falcon, gptneox, bert-embeddings, falcon-ggml, gptj, gpt2, dolly, mpt, replit, starcoder, bloomz, rwkv, whisper, stablediffusion, piper 10:02PM DBG [llama] Attempting to load 10:02PM DBG Loading model llama from ggml-mpt-7b-chat.bin 10:02PM DBG Loading model in memory from file: /root/LocalAI/models/ggml-mpt-7b-chat.bin 10:02PM DBG Loading GRPC Model llama: {backendString:llama model:ggml-mpt-7b-chat.bin threads:8 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000382000 externalBackends:map[] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false} 10:02PM DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/llama 10:02PM DBG GRPC Service for ggml-mpt-7b-chat.bin will be running at: '127.0.0.1:37345' 10:02PM DBG GRPC Service state dir: /tmp/go-processmanager2291379687 10:02PM DBG GRPC Service Started rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:37345: connect: connection refused" 10:02PM DBG GRPC(ggml-mpt-7b-chat.bin-127.0.0.1:37345): stderr 2023/08/27 22:02:50 gRPC Server listening at 127.0.0.1:37345 10:02PM DBG GRPC Service Ready 10:02PM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo: