cesinsingapore commented 5 months ago

AI reply not makesense

cesinsingapore commented 5 months ago

its working fine with another model

AlexM4H commented 5 months ago

Same behaviour for me.

Temporary Workaroud: GPU_LAYERS: 0

Further infos:

https://huggingface.co/bartowski/Qwen2-7B-Instruct-GGUF/discussions/1

"You can also enable flash attention for llamacpp which should be able to work around the issue"

Is flash attention already set in the actual docker images?

cesinsingapore commented 5 months ago

I'm using docker-compose direct image from localai latest

Docker-compose.yml

version: "3.9" services: api: image: localai/localai:latest-aio-gpu-nvidia-cuda-12 healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"] interval: 1m timeout: 20m retries: 5 ports:

8080:8080 environment:
DEBUG=true
...

volumes:
./models:/build/models:cached
decomment the following piece if running with Nvidia GPUs

deploy: resources: reservations: devices:
- driver: nvidia count: 1 capabilities: [gpu]`

AlexM4H commented 5 months ago

Have you entered flash_attention: true in your model yaml file?

cesinsingapore commented 5 months ago

do you mean like this ? but it still generate like that after i restarted

models/qwen.yaml

root@a4681b4b3146:/build/models# cat qwen2-7b-instruct.yaml

context_size: 4096 f16: true mmap: true name: qwen2-7b-instruct flash_attention: true parameters: model: Qwen2-7B-Instruct-Q4_K_M.gguf stopwords:

<|im_end|>

template: chat: | {{.Input -}} <|im_start|>assistant chat_message: | <|im_start|>{{ .RoleName }} {{ if .FunctionCall -}} Function call: {{ else if eq .RoleName "tool" -}} Function response: {{ end -}} {{ if .Content -}} {{.Content }} {{ end -}} {{ if .FunctionCall -}} {{toJson .FunctionCall}} {{ end -}}<|im_end|> completion: | {{.Input}} function: | <|im_start|>system You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools: {{range .Functions}} {'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }} {{end}} For each function call return a json object with function name and arguments <|im_end|> {{.Input -}} <|im_start|>assistant

AlexM4H commented 5 months ago

Yes, so it works for me.

AlexM4H commented 5 months ago

@cesinsingapore did you solve your problem?

cesinsingapore commented 4 months ago

nope its not

mudler / LocalAI

model qwen2-7b-instruct #2553

Docker-compose.yml

...

decomment the following piece if running with Nvidia GPUs

models/qwen.yaml