mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
26.21k stars 1.96k forks source link

Crash upon calling `/completions` #89

Closed chris-hatton closed 1 year ago

chris-hatton commented 1 year ago

I have LocalAI hosted in a docker container. Calling models endpoint provides expected output:

{"object":"list","data":[{"id":"ggml-gpt4all-j.bin","object":"model"},{"id":"ggml-model-f16.bin","object":"model"}]}

But providing the example prompt - at either of those models - yields an opaque looking error:

internal/poll.(*FD).Read(0xc000110000, {0xc00032e000, 0x1000, 0x1000})
    /usr/local/go/src/internal/poll/fd_unix.go:167 +0x299 fp=0xc000287b40 sp=0xc000287aa8 pc=0x4ba939
net.(*netFD).Read(0xc000110000, {0xc00032e000?, 0xc000116088?, 0xc000116000?})
    /usr/local/go/src/net/fd_posix.go:55 +0x29 fp=0xc000287b88 sp=0xc000287b40 pc=0x583d09
net.(*conn).Read(0xc000114000, {0xc00032e000?, 0xc000114000?, 0xc00032f000?})
    /usr/local/go/src/net/net.go:183 +0x45 fp=0xc000287bd0 sp=0xc000287b88 pc=0x5930c5
net.(*TCPConn).Read(0xc0003061e0?, {0xc00032e000?, 0x7b8c8f?, 0x7bbca5?})
    <autogenerated>:1 +0x29 fp=0xc000287c00 sp=0xc000287bd0 pc=0x5a5a69
bufio.(*Reader).fill(0xc00008a720)
    /usr/local/go/src/bufio/bufio.go:106 +0xff fp=0xc000287c38 sp=0xc000287c00 pc=0x5c42df
bufio.(*Reader).Peek(0xc00008a720, 0x1)
    /usr/local/go/src/bufio/bufio.go:144 +0x5d fp=0xc000287c58 sp=0xc000287c38 pc=0x5c443d
github.com/valyala/fasthttp.(*Server).serveConn(0xc000306000, {0xab69c0?, 0xc000114000})
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/server.go:2183 +0x58e fp=0xc000287ec8 sp=0xc000287c58 pc=0x7c874e
github.com/valyala/fasthttp.(*Server).serveConn-fm({0xab69c0?, 0xc000114000?})
    <autogenerated>:1 +0x39 fp=0xc000287ef0 sp=0xc000287ec8 pc=0x7d8a59
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0000b7860, 0xc000118020)
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/workerpool.go:224 +0xa9 fp=0xc000287fa0 sp=0xc000287ef0 pc=0x7d4d29
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/workerpool.go:196 +0x38 fp=0xc000287fe0 sp=0xc000287fa0 pc=0x7d4a98
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000287fe8 sp=0xc000287fe0 pc=0x482821
created by github.com/valyala/fasthttp.(*workerPool).getCh
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/workerpool.go:195 +0x1b0
goroutine 19 [IO wait]:
runtime.gopark(0x0?, 0xb?, 0x0?, 0x0?, 0x8?)
    /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00006e900 sp=0xc00006e8e0 pc=0x453eb6
runtime.netpollblock(0x495f05?, 0x41fb4f?, 0x0?)
    /usr/local/go/src/runtime/netpoll.go:527 +0xf7 fp=0xc00006e938 sp=0xc00006e900 pc=0x44c8b7
internal/poll.runtime_pollWait(0x7fad85b947a0, 0x72)
    /usr/local/go/src/runtime/netpoll.go:306 +0x89 fp=0xc00006e958 sp=0xc00006e938 pc=0x47d529
internal/poll.(*pollDesc).wait(0xc000110080?, 0xc000122000?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32 fp=0xc00006e980 sp=0xc00006e958 pc=0x4b9552
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000110080, {0xc000122000, 0x1000, 0x1000})
    /usr/local/go/src/internal/poll/fd_unix.go:167 +0x299 fp=0xc00006ea18 sp=0xc00006e980 pc=0x4ba939
net.(*netFD).Read(0xc000110080, {0xc000122000?, 0x7fad86bc4228?, 0x7c540d?})
    /usr/local/go/src/net/fd_posix.go:55 +0x29 fp=0xc00006ea60 sp=0xc00006ea18 pc=0x583d09
net.(*conn).Read(0xc000114008, {0xc000122000?, 0x41f605?, 0x59?})
    /usr/local/go/src/net/net.go:183 +0x45 fp=0xc00006eaa8 sp=0xc00006ea60 pc=0x5930c5
net.(*TCPConn).Read(0x1010000000000?, {0xc000122000?, 0x7fadaf75b5b8?, 0x1000?})
    <autogenerated>:1 +0x29 fp=0xc00006ead8 sp=0xc00006eaa8 pc=0x5a5a69
bufio.(*Reader).fill(0xc00011c0c0)
    /usr/local/go/src/bufio/bufio.go:106 +0xff fp=0xc00006eb10 sp=0xc00006ead8 pc=0x5c42df
bufio.(*Reader).Peek(0xc00011c0c0, 0x1)
    /usr/local/go/src/bufio/bufio.go:144 +0x5d fp=0xc00006eb30 sp=0xc00006eb10 pc=0x5c443d
github.com/valyala/fasthttp.(*RequestHeader).tryRead(0xc000120000, 0xc00011c0c0, 0x1)
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/header.go:2184 +0x5a fp=0xc00006ec18 sp=0xc00006eb30 pc=0x7ac59a
github.com/valyala/fasthttp.(*RequestHeader).readLoop(0xc000306000?, 0xc00011c0c0, 0x1)
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/header.go:2115 +0x4d fp=0xc00006ec58 sp=0xc00006ec18 pc=0x7abf8d
github.com/valyala/fasthttp.(*RequestHeader).Read(...)
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/header.go:2106
github.com/valyala/fasthttp.(*Server).serveConn(0xc000306000, {0xab69c0?, 0xc000114008})
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/server.go:2244 +0x918 fp=0xc00006eec8 sp=0xc00006ec58 pc=0x7c8ad8
github.com/valyala/fasthttp.(*Server).serveConn-fm({0xab69c0?, 0xc000114008?})
    <autogenerated>:1 +0x39 fp=0xc00006eef0 sp=0xc00006eec8 pc=0x7d8a59
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0000b7860, 0xc000118040)
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/workerpool.go:224 +0xa9 fp=0xc00006efa0 sp=0xc00006eef0 pc=0x7d4d29
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/workerpool.go:196 +0x38 fp=0xc00006efe0 sp=0xc00006efa0 pc=0x7d4a98
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x482821
created by github.com/valyala/fasthttp.(*workerPool).getCh
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/workerpool.go:195 +0x1b0
goroutine 7 [sleep]:
runtime.gopark(0x219ceb706396b?, 0x965de0?, 0xf8?, 0xc1?, 0x1?)
    /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000384f88 sp=0xc000384f68 pc=0x453eb6
time.Sleep(0x3b9aca00)
    /usr/local/go/src/runtime/time.go:195 +0x135 fp=0xc000384fc8 sp=0xc000384f88 pc=0x47f695
github.com/valyala/fasthttp.updateServerDate.func1()
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/header.go:2246 +0x1e fp=0xc000384fe0 sp=0xc000384fc8 pc=0x7d517e
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000384fe8 sp=0xc000384fe0 pc=0x482821
created by github.com/valyala/fasthttp.updateServerDate
    /go/pkg/mod/github.com/valyala/fasthttp@v1.44.0/header.go:2244 +0x25
rax    0x478a693f04c46b1d
rbx    0x0
������
rcx    0x270
rdx    0x4c46d8c
rdi    0x7fad78000cd0
rsi    0x16e0
rbp    0x7fad87c03d60
rsp    0x7fad87c02890
r8     0x7fad78000cd0
r9     0x7fad78000080
r10    0x6f
r11    0x0
r12    0x7fad78000ca0
r13    0x0
r14    0x7fad78000cd0
������
r15    0x200
rip    0x904e04
rflags 0x10246
cs     0x33
fs     0x0
gs     0x0

Is there anything more I can try to help diagnose the reason?

I am running this on an HP z800 Workstation which is a fairly old machine using Dual Xeon X5570 CPU's. These don't have the AVX instruction set, in case that's a hard requirement, with proc/cpuinfo being:

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 26
model name  : Intel(R) Xeon(R) CPU           X5570  @ 2.93GHz
stepping    : 5
microcode   : 0x1d
cpu MHz     : 1596.000
cache size  : 8192 KB
physical id : 0
siblings    : 8
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 11
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida flush_l1d
vmx flags   : vnmi preemption_timer invvpid ept_x_only flexpriority tsc_offset vtpr mtf vapic ept vpid
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips    : 5860.84
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Standalone llama.cpp works, albeit slowly.

shengkaixuan commented 1 year ago

got the same error when use the curl command, deployed localai in k8s cluster curl http://xxxx/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "koala-7B-4bit-128g.bin", "messages": [{"role": "user", "content": "Say this is a test!"}], "temperature": 0.7 }' localai-local-ai.log

mudler commented 1 year ago

Closing, dup of #88.