Closed Pablo1107 closed 1 year ago
Same here. I think this is not the segmentation issue is not related to the memory size. I have 64 GB ram and the docker container can consume a half of the memory. The segmentation issue is happening on falcon-7b model
Same here. I think this is not the segmentation issue is not related to the memory size. I have 64 GB ram and the docker container can consume a half of the memory. The segmentation issue is happening on falcon-7b model
What exact file are you using?
Tried all Bloke repository files and gpt4all-falcon file. I think MPT and Falcon models do not work. GPT4ALL is working. I think this Github repository is not maintained properly.
Obviously, we can only use MPT or Falcon but cannot use llama nor gpt4all due to license issue. Now talking about llama and gpt4all under K8S is meaningless. Since these llama and gpt4all models are only for your personal work or research, there will be no use of K8S. :p
Tried all Bloke repository files and gpt4all-falcon file. I think MPT and Falcon models do not work. GPT4ALL is working. I think this Github repository is not maintained properly.
Please file issues for the problems you find - this is how it works. If you keep the things that work or not by yourself things will never get fixed. This is a community, open source project - so everyone is trying to help each other here!
Obviously, we can only use MPT or Falcon but cannot use llama nor gpt4all due to license issue. Now talking about llama and gpt4all under K8S is meaningless. Since these llama and gpt4all models are only for your personal work or research, there will be no use of K8S. :p
You are wrong here, there are OpenLLama based models that can be used freely, and gpt4all models based on GPT-J. MPT with gpt4all should work.
I didn't tried Falcon neither MPT recently, as I'm busy with #726 , but I think the model you are trying is not the one I've tried it - that looks somewhat newer.
@mudler Thanks for building this great project. Could you share the Falcon 7B model file you tested with (#516)? This will unblock us to use Falcon with this nice tool.
I had a quick look at the current state and seems most of the work to support falcon went to ggllm.cpp. I quickly give a shot at creating bindings and seems to work with wizardlm-uncensored: https://github.com/mudler/go-ggllm.cpp - I will integrate it in LocalAI soon, that should give support for 7b and 40b at least and GPU support
I'm having a closer look at it this weekend, a spare attempt seems to work here with falcon-7b. I'm looking into refactoring the backends first to get rid of some hacks, but this shouldn't take long.
Now master should have falcon
working. I've been trying locally with : https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GGML/tree/main .
I've also kept the old ggml implementation as a fallback in the falcon-ggml
backend
Note: you need to be extra-careful to have a matching prompt. Without it the model hallucinates pretty quickly
LocalAI version: LocalAI version LocalAI v1.20.1-dirty (92614b91d7b2e5ceb4db28c640314df7fec3d96f)
Environment, CPU architecture, OS, and Version: Linux t14s 6.4.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 01 Jul 2023 16:17:21 +0000 x86_64 GNU/Linux
Describe the bug Running LocalAI with
falcon7b-instruct.ggmlv3.fp16.bin
from TheBloke it is putting me out of memory with 16GB of RAM. So I triedfalcon7b-instruct.ggmlv3.q8_0.bin
which works with a little bit less of RAM but seg fault the backend.To Reproduce 1) Download this version of
falcon-7b
2) Run a any prompt.Expected behavior To not seg fault.
Logs
Expand
``` ❯ local-ai --debug Starting LocalAI using 4 threads, with models path: /home/pablo/.local/share/local-ai/models unexpected end of JSON input ┌───────────────────────────────────────────────────┐ │ Fiber v2.47.0 │ │ http://127.0.0.1:8080 │ │ (bound on host 0.0.0.0 and port 8080) │ │ │ │ Handlers ............ 32 Processes ........... 1 │ │ Prefork ....... Disabled PID .............. 8181 │ └───────────────────────────────────────────────────┘ 12:55PM DBG Request received: {"model":"falcon7b-instruct.ggmlv3.q8_0.bin","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"###\nRole name: shell\nProvide only zsh commands for Linux/Arch Linux without any description.\nIf there is a lack of details, provide most logical solution.\nEnsure the output is a valid shell command.\nIf multiple steps required try to combine them together.\n\nRequest: concat two .bin files into one\n###\nCommand:"}],"stream":true,"echo":false,"top_p":1,"top_k":0,"temperature":0.1,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0} 12:55PM DBG Parameter Config: &{OpenAIRequest:{Model:falcon7b-instruct.ggmlv3.q8_0.bin File: Language: ResponseFormat: Size: Prompt:Additional context