mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
25.7k stars 1.93k forks source link

model qwen2-7b-instruct #2553

Open cesinsingapore opened 5 months ago

cesinsingapore commented 5 months ago

image AI reply not makesense

cesinsingapore commented 5 months ago

its working fine with another model

AlexM4H commented 5 months ago

Same behaviour for me.

Temporary Workaroud: GPU_LAYERS: 0

Further infos:

https://huggingface.co/bartowski/Qwen2-7B-Instruct-GGUF/discussions/1

"You can also enable flash attention for llamacpp which should be able to work around the issue"

Is flash attention already set in the actual docker images?

cesinsingapore commented 5 months ago

I'm using docker-compose direct image from localai latest

Docker-compose.yml

version: "3.9" services: api: image: localai/localai:latest-aio-gpu-nvidia-cuda-12 healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"] interval: 1m timeout: 20m retries: 5 ports:

AlexM4H commented 5 months ago

Have you entered flash_attention: true in your model yaml file?

cesinsingapore commented 5 months ago

do you mean like this ? but it still generate like that after i restarted

models/qwen.yaml

root@a4681b4b3146:/build/models# cat qwen2-7b-instruct.yaml

context_size: 4096 f16: true mmap: true name: qwen2-7b-instruct flash_attention: true parameters: model: Qwen2-7B-Instruct-Q4_K_M.gguf stopwords:

  • <|im_end|>
  • template: chat: | {{.Input -}} <|im_start|>assistant chat_message: | <|im_start|>{{ .RoleName }} {{ if .FunctionCall -}} Function call: {{ else if eq .RoleName "tool" -}} Function response: {{ end -}} {{ if .Content -}} {{.Content }} {{ end -}} {{ if .FunctionCall -}} {{toJson .FunctionCall}} {{ end -}}<|im_end|> completion: | {{.Input}} function: | <|im_start|>system You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: {{range .Functions}} {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }} {{end}} For each function call return a json object with function name and arguments <|im_end|> {{.Input -}} <|im_start|>assistant
AlexM4H commented 5 months ago

Yes, so it works for me.

AlexM4H commented 5 months ago

@cesinsingapore did you solve your problem?

cesinsingapore commented 4 months ago

nope its not