Closed ajasingh closed 1 year ago
Thanks for the issue @ajasingh ! Sorry you encountered an error.. do you see any errors in the logs at ~/.ollama/logs/server.log
?
Me too
Terminal showed: Error:Post "http://127.0.0.1:11434/api/generate": EOF
+1 Same error in terminal. No errors in log:
GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
- using env: export GIN_MODE=release
- using code: gin.SetMode(gin.ReleaseMode)
[GIN-debug] GET / --> github.com/jmorganca/ollama/server.Serve.func1 (4 handlers)
[GIN-debug] POST /api/pull --> github.com/jmorganca/ollama/server.PullModelHandler (4 handlers)
[GIN-debug] POST /api/generate --> github.com/jmorganca/ollama/server.GenerateHandler (4 handlers)
[GIN-debug] POST /api/create --> github.com/jmorganca/ollama/server.CreateModelHandler (4 handlers)
[GIN-debug] POST /api/push --> github.com/jmorganca/ollama/server.PushModelHandler (4 handlers)
[GIN-debug] POST /api/copy --> github.com/jmorganca/ollama/server.CopyModelHandler (4 handlers)
[GIN-debug] GET /api/tags --> github.com/jmorganca/ollama/server.ListModelsHandler (4 handlers)
[GIN-debug] DELETE /api/delete --> github.com/jmorganca/ollama/server.DeleteModelHandler (4 handlers)
2023/07/31 11:48:50 routes.go:276: Listening on 127.0.0.1:11434
Encountered the same issue consistently on a Macbook Pro ( M1 MAX with 32 GB RAM ) when attempting to play a Choose Your Own Adventure style game with it. It seems to die each time on the 6 or 7th prompt. Here are some additional log info:
ggml_metal_free: deallocating
[GIN] 2023/08/01 - 09:15:47 | 200 | 1m0s | 127.0.0.1 | POST "/api/generate"
llama.cpp: loading model from /Users/douglas/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 4096
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: freq_base = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 0.08 MB
llama_model_load_internal: mem required = 5423.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size = 2048.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Applications/Ollama.app/Contents/Resources/ggml-metal.metal'
ggml_metal_init: loaded kernel_add 0x154f7cde0
ggml_metal_init: loaded kernel_mul 0x1569b01e0
ggml_metal_init: loaded kernel_mul_row 0x154f82e40
ggml_metal_init: loaded kernel_scale 0x154f83020
ggml_metal_init: loaded kernel_silu 0x154f83200
ggml_metal_init: loaded kernel_relu 0x154f833e0
ggml_metal_init: loaded kernel_gelu 0x154f835c0
ggml_metal_init: loaded kernel_soft_max 0x154f837a0
ggml_metal_init: loaded kernel_diag_mask_inf 0x154f83980
ggml_metal_init: loaded kernel_get_rows_f16 0x154f83b60
ggml_metal_init: loaded kernel_get_rows_q4_0 0x154f83d40
ggml_metal_init: loaded kernel_get_rows_q4_1 0x154f83f20
ggml_metal_init: loaded kernel_get_rows_q2_K 0x154f84100
ggml_metal_init: loaded kernel_get_rows_q3_K 0x154f842e0
ggml_metal_init: loaded kernel_get_rows_q4_K 0x154f844c0
ggml_metal_init: loaded kernel_get_rows_q5_K 0x154f846a0
ggml_metal_init: loaded kernel_get_rows_q6_K 0x154f84880
ggml_metal_init: loaded kernel_rms_norm 0x154f84a60
ggml_metal_init: loaded kernel_norm 0x154f84c40
ggml_metal_init: loaded kernel_mul_mat_f16_f32 0x154f84e20
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32 0x154f85000
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32 0x154f851e0
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32 0x154f853c0
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32 0x154f21fb0
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32 0x154f22190
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32 0x154f22370
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32 0x154f22550
ggml_metal_init: loaded kernel_rope 0x154f22730
ggml_metal_init: loaded kernel_alibi_f32 0x154f22910
ggml_metal_init: loaded kernel_cpy_f32_f16 0x154f22af0
ggml_metal_init: loaded kernel_cpy_f32_f32 0x154f22cd0
ggml_metal_init: loaded kernel_cpy_f16_f16 0x154f22eb0
ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: maxTransferRate = built-in GPU
ggml_metal_add_buffer: allocated 'data ' buffer, size = 3616.08 MB, ( 3619.95 / 21845.34)
ggml_metal_add_buffer: allocated 'eval ' buffer, size = 784.00 MB, ( 4403.95 / 21845.34)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 2050.00 MB, ( 6453.95 / 21845.34)
ggml_metal_add_buffer: allocated 'scr0 ' buffer, size = 512.00 MB, ( 6965.95 / 21845.34)
ggml_metal_add_buffer: allocated 'scr1 ' buffer, size = 512.00 MB, ( 7477.95 / 21845.34)
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x50 pc=0x103098f64]
runtime stack:
runtime.throw({0x10310b007?, 0x16d26f720?})
/opt/homebrew/Cellar/go/1.20.5/libexec/src/runtime/panic.go:1047 +0x40 fp=0x16d26f670 sp=0x16d26f640 pc=0x102baf630
runtime.sigpanic()
/opt/homebrew/Cellar/go/1.20.5/libexec/src/runtime/signal_unix.go:825 +0x244 fp=0x16d26f6b0 sp=0x16d26f670 pc=0x102bc6554
goroutine 10 [syscall]:
runtime.cgocall(0x1030941fc, 0x14000061b88)
/opt/homebrew/Cellar/go/1.20.5/libexec/src/runtime/cgocall.go:157 +0x54 fp=0x14000061b50 sp=0x14000061b10 pc=0x102b7ee24
github.com/jmorganca/ollama/llama._Cfunc_llama_eval(0x12e03c000, 0x140002ac000, 0x709, 0x1, 0xa)
_cgo_gotypes.go:210 +0x38 fp=0x14000061b80 sp=0x14000061b50 pc=0x10307fee8
github.com/jmorganca/ollama/llama.(*llama).generate.func2(0x14000412140, 0x0?)
/Users/jmorgan/workspace/ollama/llama/llama.go:211 +0xa0 fp=0x14000061bf0 sp=0x14000061b80 pc=0x103082050
github.com/jmorganca/ollama/llama
Can provide full stack trace if needed.
Seems related to #186 ?
This happens to me as well
There are a lot of stability improvements in the upcoming release which should address this and other Post "http://127.0.0.1:11434/api/generate": EOF
issues
@mxyng Will this allow us to run 7B models on Mac M1 or just show more informative error messages?
We've tested llama2:7b
with those changes on 8GB RAM without issue.
Any planned ETA for 0.0.13 or just "when it's done"?
soon™️. Jokes aside, we're working out some minor bugs. It'll be released once those have been resolved.
This should be more stable as of v0.0.13's release. I'll open this again in the case of more reports.
Im not able to run llama2 it gives me below error when i run the ollama run llama2 Error: Post "http://127.0.0.1/api/generate" : EOF