mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
25.16k stars 1.91k forks source link

Support for --n-gpu-layers #586

Closed FireMasterK closed 1 year ago

FireMasterK commented 1 year ago

Is your feature request related to a problem? Please describe. Despite building with cuBLAS, LocalAI still uses only my CPU by the looks of it.

Describe the solution you'd like Usage of the GPU for inferencing.

Describe alternatives you've considered N/A / unaware of any alternatives.

Additional context See https://github.com/ggerganov/llama.cpp/issues/1448

mudler commented 1 year ago

you can already specify the gpu layers in the YAML model config file with gpu_layers: ..., would that cover it?

FireMasterK commented 1 year ago

I think that would cover it indeed!

When I try using it, I get an panic error:

``` llama.cpp: loading model from /models/WizardLM-7B-uncensored.ggmlv3.q2_K llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32001 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 4096 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 32 llama_model_load_internal: n_layer = 32 llama_model_load_internal: n_rot = 128 llama_model_load_internal: ftype = 10 (mostly Q2_K) llama_model_load_internal: n_ff = 11008 llama_model_load_internal: n_parts = 1 llama_model_load_internal: model size = 7B llama_model_load_internal: ggml ctx size = 2672.12 MB warning: failed to mlock 2801922048-byte buffer (after previously locking 0 bytes): Cannot allocate memory Try increasing RLIMIT_MLOCK ('ulimit -l' as root). ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3060 Laptop GPU llama_model_load_internal: using CUDA for GPU acceleration llama_model_load_internal: mem required = 1874.12 MB (+ 2052.00 MB per state) llama_model_load_internal: allocating batch_size x 1 MB = 0 MB VRAM for the scratch buffer llama_model_load_internal: offloading 32 layers to GPU llama_model_load_internal: total VRAM used: 2590 MB ................................................................................................... llama_init_from_file: kv self size = 512.00 MB GGML_ASSERT: /build/go-llama/llama.cpp/ggml-cuda.cu:1804: size <= g_scratch_size SIGABRT: abort PC=0x7f12f66c1ce1 m=4 sigcode=18446744073709551610 signal arrived during cgo execution goroutine 39 [syscall]: runtime.cgocall(0x986280, 0xc0000c0f40) /usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc0000c0f18 sp=0xc0000c0ee0 pc=0x449f7c github.com/go-skynet/go-llama%2ecpp._Cfunc_llama_predict(0x7f112bd2ba00, 0x7f1294000bc0, 0xc000600000, 0x0) _cgo_gotypes.go:217 +0x4c fp=0xc0000c0f40 sp=0xc0000c0f18 pc=0x8d022c github.com/go-skynet/go-llama%2ecpp.(*LLama).Predict.func2(0xc000153800?, 0xc0000c1130?, {0xc000600000, 0x0?, 0x453bc7?}, 0x10?) /build/go-llama/llama.go:211 +0x94 fp=0xc0000c0f90 sp=0xc0000c0f40 pc=0x8d2d94 github.com/go-skynet/go-llama%2ecpp.(*LLama).Predict(0xc0004fc480, {0xc000153800, 0x783}, {0xc000125ae0, 0xc, 0x0?}) /build/go-llama/llama.go:211 +0x2c8 fp=0xc0000c1248 sp=0xc0000c0f90 pc=0x8d2a28 github.com/go-skynet/LocalAI/api.ModelInference.func12() /build/api/prediction.go:532 +0xde fp=0xc0000c1568 sp=0xc0000c1248 pc=0x9535fe github.com/go-skynet/LocalAI/api.ModelInference.func14() /build/api/prediction.go:574 +0x1aa fp=0xc0000c1620 sp=0xc0000c1568 pc=0x95314a github.com/go-skynet/LocalAI/api.ComputeChoices({0xc000153800, 0x783}, 0xc00014e140, 0xc0002dcb00, 0xc000164a80?, 0x10c0080, 0xc0004fc3f0?) /build/api/prediction.go:598 +0x284 fp=0xc0000c1ec0 sp=0xc0000c1620 pc=0x9564e4 github.com/go-skynet/LocalAI/api.chatEndpoint.func1({0xc000153800, 0x783}, 0xc00014e140, 0x894cee894cc2894c?, 0xf1a6e8240c8948cf?, 0xc00007c240) /build/api/openai.go:344 +0x1d6 fp=0xc0000c1fa0 sp=0xc0000c1ec0 pc=0x956c96 github.com/go-skynet/LocalAI/api.chatEndpoint.func2.3() /build/api/openai.go:415 +0x3f fp=0xc0000c1fe0 sp=0xc0000c1fa0 pc=0x94d07f runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000c1fe8 sp=0xc0000c1fe0 pc=0x4ad241 created by github.com/go-skynet/LocalAI/api.chatEndpoint.func2 /build/api/openai.go:415 +0x7f1 goroutine 1 [IO wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0002db288 sp=0xc0002db268 pc=0x47df76 runtime.netpollblock(0x7f12d466e7e8?, 0x44960f?, 0x0?) /usr/local/go/src/runtime/netpoll.go:527 +0xf7 fp=0xc0002db2c0 sp=0xc0002db288 pc=0x476817 internal/poll.runtime_pollWait(0x7f12a957dc18, 0x72) /usr/local/go/src/runtime/netpoll.go:306 +0x89 fp=0xc0002db2e0 sp=0xc0002db2c0 pc=0x4a79e9 internal/poll.(*pollDesc).wait(0xc000032400?, 0x4?, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x32 fp=0xc0002db308 sp=0xc0002db2e0 pc=0x51e952 internal/poll.(*pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:89 internal/poll.(*FD).Accept(0xc000032400) /usr/local/go/src/internal/poll/fd_unix.go:614 +0x2bd fp=0xc0002db3b0 sp=0xc0002db308 pc=0x52425d net.(*netFD).accept(0xc000032400) /usr/local/go/src/net/fd_unix.go:172 +0x35 fp=0xc0002db468 sp=0xc0002db3b0 pc=0x5ab155 net.(*TCPListener).accept(0xc0000125d0) /usr/local/go/src/net/tcpsock_posix.go:148 +0x25 fp=0xc0002db490 sp=0xc0002db468 pc=0x5c1505 net.(*TCPListener).Accept(0xc0000125d0) /usr/local/go/src/net/tcpsock.go:297 +0x3d fp=0xc0002db4c0 sp=0xc0002db490 pc=0x5c05fd github.com/valyala/fasthttp.acceptConn(0xc0002d0000, {0x1150800, 0xc0000125d0}, 0xc0002db6b8) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/server.go:1930 +0x62 fp=0xc0002db5a0 sp=0xc0002db4c0 pc=0x7f8e42 github.com/valyala/fasthttp.(*Server).Serve(0xc0002d0000, {0x1150800?, 0xc0000125d0}) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/server.go:1823 +0x4f4 fp=0xc0002db6e8 sp=0xc0002db5a0 pc=0x7f8454 github.com/gofiber/fiber/v2.(*App).Listen(0xc00013b200, {0xc4c22c?, 0x7?}) /go/pkg/mod/github.com/gofiber/fiber/v2@v2.46.0/listen.go:82 +0x110 fp=0xc0002db748 sp=0xc0002db6e8 pc=0x88e190 main.main.func1(0xc0002c2160?) /build/main.go:140 +0x6c6 fp=0xc0002db990 sp=0xc0002db748 pc=0x985b66 github.com/urfave/cli/v2.(*Command).Run(0xc0002c2160, 0xc00013c900, {0xc000024180, 0x3, 0x3}) /go/pkg/mod/github.com/urfave/cli/v2@v2.25.5/command.go:274 +0x9eb fp=0xc0002dbc30 sp=0xc0002db990 pc=0x97304b github.com/urfave/cli/v2.(*App).RunContext(0xc0002be000, {0x1150b68?, 0xc00002c078}, {0xc000024180, 0x3, 0x3}) /go/pkg/mod/github.com/urfave/cli/v2@v2.25.5/app.go:332 +0x616 fp=0xc0002dbca0 sp=0xc0002dbc30 pc=0x96f976 github.com/urfave/cli/v2.(*App).Run(...) /go/pkg/mod/github.com/urfave/cli/v2@v2.25.5/app.go:309 main.main() /build/main.go:144 +0xff6 fp=0xc0002dbf80 sp=0xc0002dbca0 pc=0x9853d6 runtime.main() /usr/local/go/src/runtime/proc.go:250 +0x207 fp=0xc0002dbfe0 sp=0xc0002dbf80 pc=0x47db47 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0002dbfe8 sp=0xc0002dbfe0 pc=0x4ad241 goroutine 2 [force gc (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000062fb0 sp=0xc000062f90 pc=0x47df76 runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:387 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:305 +0xb0 fp=0xc000062fe0 sp=0xc000062fb0 pc=0x47ddb0 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000062fe8 sp=0xc000062fe0 pc=0x4ad241 created by runtime.init.6 /usr/local/go/src/runtime/proc.go:293 +0x25 goroutine 18 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00005e780 sp=0xc00005e760 pc=0x47df76 runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:387 runtime.bgsweep(0x0?) /usr/local/go/src/runtime/mgcsweep.go:319 +0xde fp=0xc00005e7c8 sp=0xc00005e780 pc=0x46a11e runtime.gcenable.func1() /usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc00005e7e0 sp=0xc00005e7c8 pc=0x45f386 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00005e7e8 sp=0xc00005e7e0 pc=0x4ad241 created by runtime.gcenable /usr/local/go/src/runtime/mgc.go:178 +0x6b goroutine 19 [GC scavenge wait]: runtime.gopark(0x80250e450a1?, 0x3ba58584?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00005ef70 sp=0xc00005ef50 pc=0x47df76 runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:387 runtime.(*scavengerState).park(0x15a32a0) /usr/local/go/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc00005efa0 sp=0xc00005ef70 pc=0x467ff3 runtime.bgscavenge(0x0?) /usr/local/go/src/runtime/mgcscavenge.go:633 +0x65 fp=0xc00005efc8 sp=0xc00005efa0 pc=0x4685e5 runtime.gcenable.func2() /usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc00005efe0 sp=0xc00005efc8 pc=0x45f326 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00005efe8 sp=0xc00005efe0 pc=0x4ad241 created by runtime.gcenable /usr/local/go/src/runtime/mgc.go:179 +0xaa goroutine 3 [finalizer wait, 1 minutes]: runtime.gopark(0x1a0?, 0x15a3f80?, 0x20?, 0x75?, 0xc000062770?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000062628 sp=0xc000062608 pc=0x47df76 runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000627e0 sp=0xc000062628 pc=0x45e3c7 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000627e8 sp=0xc0000627e0 pc=0x4ad241 created by runtime.createfing /usr/local/go/src/runtime/mfinal.go:163 +0x45 goroutine 4 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000063750 sp=0xc000063730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000637e0 sp=0xc000063750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000637e8 sp=0xc0000637e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 5 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000063f50 sp=0xc000063f30 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc000063fe0 sp=0xc000063f50 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000063fe8 sp=0xc000063fe0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 6 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000064750 sp=0xc000064730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000647e0 sp=0xc000064750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000647e8 sp=0xc0000647e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 7 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000064f50 sp=0xc000064f30 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc000064fe0 sp=0xc000064f50 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 8 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000065750 sp=0xc000065730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000657e0 sp=0xc000065750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000657e8 sp=0xc0000657e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 20 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00005f750 sp=0xc00005f730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00005f7e0 sp=0xc00005f750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00005f7e8 sp=0xc00005f7e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 34 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f6750 sp=0xc0004f6730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f67e0 sp=0xc0004f6750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f67e8 sp=0xc0004f67e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 9 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000065f50 sp=0xc000065f30 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc000065fe0 sp=0xc000065f50 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 35 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f6f50 sp=0xc0004f6f30 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f6fe0 sp=0xc0004f6f50 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f6fe8 sp=0xc0004f6fe0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 36 [GC worker (idle), 1 minutes]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f7750 sp=0xc0004f7730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f77e0 sp=0xc0004f7750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f77e8 sp=0xc0004f77e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 37 [GC worker (idle)]: runtime.gopark(0x80dad20d22f?, 0x3?, 0x3?, 0x1e?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f7f50 sp=0xc0004f7f30 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f7fe0 sp=0xc0004f7f50 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f7fe8 sp=0xc0004f7fe0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 10 [GC worker (idle)]: runtime.gopark(0x80dad2e6486?, 0x3?, 0xa7?, 0x47?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f2750 sp=0xc0004f2730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f27e0 sp=0xc0004f2750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f27e8 sp=0xc0004f27e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 38 [GC worker (idle)]: runtime.gopark(0x19afd40?, 0x1?, 0xe?, 0x2d?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f8750 sp=0xc0004f8730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f87e0 sp=0xc0004f8750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f87e8 sp=0xc0004f87e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 11 [GC worker (idle)]: runtime.gopark(0x80dad20d7a4?, 0x1?, 0xd8?, 0x4e?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f2f50 sp=0xc0004f2f30 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f2fe0 sp=0xc0004f2f50 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f2fe8 sp=0xc0004f2fe0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 12 [GC worker (idle)]: runtime.gopark(0x19afd40?, 0x1?, 0x8e?, 0xd1?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f3750 sp=0xc0004f3730 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f37e0 sp=0xc0004f3750 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f37e8 sp=0xc0004f37e0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 13 [GC worker (idle)]: runtime.gopark(0x80250dd9ab2?, 0x1?, 0x54?, 0xb2?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f3f50 sp=0xc0004f3f30 pc=0x47df76 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0004f3fe0 sp=0xc0004f3f50 pc=0x4610f1 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f3fe8 sp=0xc0004f3fe0 pc=0x4ad241 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25 goroutine 21 [select, 1 minutes]: runtime.gopark(0xc0004f4f10?, 0x2?, 0x0?, 0x0?, 0xc0004f4ee4?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f4d30 sp=0xc0004f4d10 pc=0x47df76 runtime.selectgo(0xc0004f4f10, 0xc0004f4ee0, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x7be fp=0xc0004f4e70 sp=0xc0004f4d30 pc=0x48dafe github.com/go-skynet/LocalAI/api.(*galleryApplier).start.func1() /build/api/gallery.go:103 +0xe6 fp=0xc0004f4fe0 sp=0xc0004f4e70 pc=0x949326 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f4fe8 sp=0xc0004f4fe0 pc=0x4ad241 created by github.com/go-skynet/LocalAI/api.(*galleryApplier).start /build/api/gallery.go:101 +0xaa goroutine 22 [sleep]: runtime.gopark(0x80c17a253e1?, 0xc0004f5788?, 0x85?, 0xd2?, 0xc000124b70?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f5758 sp=0xc0004f5738 pc=0x47df76 time.Sleep(0x2540be400) /usr/local/go/src/runtime/time.go:195 +0x135 fp=0xc0004f5798 sp=0xc0004f5758 pc=0x4aa0b5 github.com/valyala/fasthttp.(*workerPool).Start.func2() /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/workerpool.go:67 +0x56 fp=0xc0004f57e0 sp=0xc0004f5798 pc=0x805936 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f57e8 sp=0xc0004f57e0 pc=0x4ad241 created by github.com/valyala/fasthttp.(*workerPool).Start /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/workerpool.go:59 +0xdd goroutine 24 [sleep]: runtime.gopark(0x80d7e595965?, 0xb9dc00?, 0xf0?, 0x43?, 0xc11a59aa049b9ebe?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0004f5f88 sp=0xc0004f5f68 pc=0x47df76 time.Sleep(0x3b9aca00) /usr/local/go/src/runtime/time.go:195 +0x135 fp=0xc0004f5fc8 sp=0xc0004f5f88 pc=0x4aa0b5 github.com/valyala/fasthttp.updateServerDate.func1() /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/header.go:2247 +0x1e fp=0xc0004f5fe0 sp=0xc0004f5fc8 pc=0x8068be runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0004f5fe8 sp=0xc0004f5fe0 pc=0x4ad241 created by github.com/valyala/fasthttp.updateServerDate /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/header.go:2245 +0x25 goroutine 14 [select]: runtime.gopark(0xc0002d7a08?, 0x3?, 0x67?, 0x73?, 0xc0002d79da?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000385860 sp=0xc000385840 pc=0x47df76 runtime.selectgo(0xc000385a08, 0xc0002d79d4, 0x5a9849?, 0x0, 0xc0000e3000?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x7be fp=0xc0003859a0 sp=0xc000385860 pc=0x48dafe github.com/valyala/fasthttp/fasthttputil.(*pipeConn).readNextByteBuffer(0xc00014e318, 0x1) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/fasthttputil/pipeconns.go:188 +0x1b3 fp=0xc000385a48 sp=0xc0003859a0 pc=0x7b7cb3 github.com/valyala/fasthttp/fasthttputil.(*pipeConn).read(0xc00014e318, {0xc0000e4000, 0x1000, 0xc0002f4390?}, 0x0?) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/fasthttputil/pipeconns.go:165 +0x3a fp=0xc000385a78 sp=0xc000385a48 pc=0x7b79fa github.com/valyala/fasthttp/fasthttputil.(*pipeConn).Read(0x14e8300?, {0xc0000e4000?, 0xc2?, 0x1000?}) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/fasthttputil/pipeconns.go:148 +0x88 fp=0xc000385af8 sp=0xc000385a78 pc=0x7b78e8 github.com/valyala/fasthttp.writeBodyChunked(0xc00012a330?, {0x7f12a84fe4a0, 0xc00014e318}) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/http.go:2064 +0x95 fp=0xc000385b68 sp=0xc000385af8 pc=0x7f1615 github.com/valyala/fasthttp.(*Response).writeBodyStream(0xc00012a330, 0xc0002d7c48?, 0x1) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/http.go:1976 +0x1f1 fp=0xc000385be0 sp=0xc000385b68 pc=0x7f0e71 github.com/valyala/fasthttp.(*Response).Write(0xc0000e3000?, 0x114dae0?) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/http.go:1877 +0x157 fp=0xc000385c38 sp=0xc000385be0 pc=0x7f0af7 github.com/valyala/fasthttp.writeResponse(0xc00012a000?, 0x14fa534?) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/server.go:2577 +0x5b fp=0xc000385c58 sp=0xc000385c38 pc=0x7fbfdb github.com/valyala/fasthttp.(*Server).serveConn(0xc0002d0000, {0x11532a0?, 0xc000014040}) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/server.go:2418 +0x1667 fp=0xc000385ec8 sp=0xc000385c58 pc=0x7fae07 github.com/valyala/fasthttp.(*Server).serveConn-fm({0x11532a0?, 0xc000014040?}) :1 +0x39 fp=0xc000385ef0 sp=0xc000385ec8 pc=0x80a239 github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc000124b40, 0xc00007a060) /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/workerpool.go:224 +0xa9 fp=0xc000385fa0 sp=0xc000385ef0 pc=0x806469 github.com/valyala/fasthttp.(*workerPool).getCh.func1() /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/workerpool.go:196 +0x38 fp=0xc000385fe0 sp=0xc000385fa0 pc=0x8061d8 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000385fe8 sp=0xc000385fe0 pc=0x4ad241 created by github.com/valyala/fasthttp.(*workerPool).getCh /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/workerpool.go:195 +0x1b0 goroutine 40 [chan receive]: runtime.gopark(0x4b7205?, 0x14e80c0?, 0x78?, 0x43?, 0xc00028e000?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000381d10 sp=0xc000381cf0 pc=0x47df76 runtime.chanrecv(0xc00007c240, 0xc000171f10, 0x1) /usr/local/go/src/runtime/chan.go:583 +0x49d fp=0xc000381da0 sp=0xc000381d10 pc=0x44cd3d runtime.chanrecv2(0xc00013c540?, 0xc00013c540?) /usr/local/go/src/runtime/chan.go:447 +0x18 fp=0xc000381dc8 sp=0xc000381da0 pc=0x44c878 github.com/go-skynet/LocalAI/api.chatEndpoint.func2.1(0xad148d4af6?) /build/api/openai.go:419 +0xc5 fp=0xc000381fa0 sp=0xc000381dc8 pc=0x94cd25 github.com/valyala/fasthttp.NewStreamReader.func1() /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/stream.go:44 +0x38 fp=0xc000381fe0 sp=0xc000381fa0 pc=0x7fe3f8 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000381fe8 sp=0xc000381fe0 pc=0x4ad241 created by github.com/valyala/fasthttp.NewStreamReader /go/pkg/mod/github.com/valyala/fasthttp@v1.47.0/stream.go:43 +0x37c rax 0x0 rbx 0x7f12aade9000 rcx 0x7f12f66c1ce1 rdx 0x0 rdi 0x2 rsi 0x7f12aadc6ec0 rbp 0x20000 rsp 0x7f12aadc6ec0 r8 0x0 r9 0x7f12aadc6ec0 r10 0x8 r11 0x246 r12 0x7f1016020240 r13 0x7f1016000020 r14 0x8 r15 0x7f12aaddf510 rip 0x7f12f66c1ce1 rflags 0x246 cs 0x33 fs 0x0 gs 0x0 ```

My configuration is:

name: WizardLM-7B-uncensored.ggml-gpu
parameters:
   model: WizardLM-7B-uncensored.ggmlv3.q2_K
   backend: llama
gpu_layers: 32
lenaxia commented 1 year ago

See my walk through in #504. It's for kubernetes but is easily translatable to other methods.

mudler commented 1 year ago

Closing in favor of #556, fix is on its way in https://github.com/go-skynet/LocalAI/pull/597