Error : Post /api/generate :EOF

ajasingh commented 1 year ago

Im not able to run llama2 it gives me below error when i run the ollama run llama2 Error: Post "http://127.0.0.1/api/generate" : EOF

jmorganca commented 1 year ago

Thanks for the issue @ajasingh ! Sorry you encountered an error.. do you see any errors in the logs at ~/.ollama/logs/server.log?

James5-cell commented 1 year ago

Me too

Terminal showed: Error:Post "http://127.0.0.1:11434/api/generate": EOF

nwesterlaken commented 1 year ago

+1 Same error in terminal. No errors in log:

GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.Serve.func1 (4 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (4 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (4 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (4 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (4 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (4 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (4 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (4 handlers)
2023/07/31 11:48:50 routes.go:276: Listening on 127.0.0.1:11434

douglas-jaris commented 1 year ago

Encountered the same issue consistently on a Macbook Pro ( M1 MAX with 32 GB RAM ) when attempting to play a Choose Your Own Adventure style game with it. It seems to die each time on the 6 or 7th prompt. Here are some additional log info:

ggml_metal_free: deallocating
[GIN] 2023/08/01 - 09:15:47 | 200 |          1m0s |       127.0.0.1 | POST     "/api/generate"
llama.cpp: loading model from /Users/douglas/.ollama/models/blobs/sha256:8daa9615cce30c259a9555b1cc250d461d1bc69980a274b44d7eda0be78076d8
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 4096
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: mem required  = 5423.72 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size  = 2048.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '/Applications/Ollama.app/Contents/Resources/ggml-metal.metal'
ggml_metal_init: loaded kernel_add                            0x154f7cde0
ggml_metal_init: loaded kernel_mul                            0x1569b01e0
ggml_metal_init: loaded kernel_mul_row                        0x154f82e40
ggml_metal_init: loaded kernel_scale                          0x154f83020
ggml_metal_init: loaded kernel_silu                           0x154f83200
ggml_metal_init: loaded kernel_relu                           0x154f833e0
ggml_metal_init: loaded kernel_gelu                           0x154f835c0
ggml_metal_init: loaded kernel_soft_max                       0x154f837a0
ggml_metal_init: loaded kernel_diag_mask_inf                  0x154f83980
ggml_metal_init: loaded kernel_get_rows_f16                   0x154f83b60
ggml_metal_init: loaded kernel_get_rows_q4_0                  0x154f83d40
ggml_metal_init: loaded kernel_get_rows_q4_1                  0x154f83f20
ggml_metal_init: loaded kernel_get_rows_q2_K                  0x154f84100
ggml_metal_init: loaded kernel_get_rows_q3_K                  0x154f842e0
ggml_metal_init: loaded kernel_get_rows_q4_K                  0x154f844c0
ggml_metal_init: loaded kernel_get_rows_q5_K                  0x154f846a0
ggml_metal_init: loaded kernel_get_rows_q6_K                  0x154f84880
ggml_metal_init: loaded kernel_rms_norm                       0x154f84a60
ggml_metal_init: loaded kernel_norm                           0x154f84c40
ggml_metal_init: loaded kernel_mul_mat_f16_f32                0x154f84e20
ggml_metal_init: loaded kernel_mul_mat_q4_0_f32               0x154f85000
ggml_metal_init: loaded kernel_mul_mat_q4_1_f32               0x154f851e0
ggml_metal_init: loaded kernel_mul_mat_q2_K_f32               0x154f853c0
ggml_metal_init: loaded kernel_mul_mat_q3_K_f32               0x154f21fb0
ggml_metal_init: loaded kernel_mul_mat_q4_K_f32               0x154f22190
ggml_metal_init: loaded kernel_mul_mat_q5_K_f32               0x154f22370
ggml_metal_init: loaded kernel_mul_mat_q6_K_f32               0x154f22550
ggml_metal_init: loaded kernel_rope                           0x154f22730
ggml_metal_init: loaded kernel_alibi_f32                      0x154f22910
ggml_metal_init: loaded kernel_cpy_f32_f16                    0x154f22af0
ggml_metal_init: loaded kernel_cpy_f32_f32                    0x154f22cd0
ggml_metal_init: loaded kernel_cpy_f16_f16                    0x154f22eb0
ggml_metal_init: recommendedMaxWorkingSetSize = 21845.34 MB
ggml_metal_init: hasUnifiedMemory             = true
ggml_metal_init: maxTransferRate              = built-in GPU
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  3616.08 MB, ( 3619.95 / 21845.34)
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =   784.00 MB, ( 4403.95 / 21845.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  2050.00 MB, ( 6453.95 / 21845.34)
ggml_metal_add_buffer: allocated 'scr0            ' buffer, size =   512.00 MB, ( 6965.95 / 21845.34)
ggml_metal_add_buffer: allocated 'scr1            ' buffer, size =   512.00 MB, ( 7477.95 / 21845.34)

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x50 pc=0x103098f64]

runtime stack:
runtime.throw({0x10310b007?, 0x16d26f720?})
    /opt/homebrew/Cellar/go/1.20.5/libexec/src/runtime/panic.go:1047 +0x40 fp=0x16d26f670 sp=0x16d26f640 pc=0x102baf630
runtime.sigpanic()
    /opt/homebrew/Cellar/go/1.20.5/libexec/src/runtime/signal_unix.go:825 +0x244 fp=0x16d26f6b0 sp=0x16d26f670 pc=0x102bc6554

goroutine 10 [syscall]:
runtime.cgocall(0x1030941fc, 0x14000061b88)
    /opt/homebrew/Cellar/go/1.20.5/libexec/src/runtime/cgocall.go:157 +0x54 fp=0x14000061b50 sp=0x14000061b10 pc=0x102b7ee24
github.com/jmorganca/ollama/llama._Cfunc_llama_eval(0x12e03c000, 0x140002ac000, 0x709, 0x1, 0xa)
    _cgo_gotypes.go:210 +0x38 fp=0x14000061b80 sp=0x14000061b50 pc=0x10307fee8
github.com/jmorganca/ollama/llama.(*llama).generate.func2(0x14000412140, 0x0?)
    /Users/jmorgan/workspace/ollama/llama/llama.go:211 +0xa0 fp=0x14000061bf0 sp=0x14000061b80 pc=0x103082050
github.com/jmorganca/ollama/llama

Can provide full stack trace if needed.

tmjoen commented 1 year ago

Seems related to #186 ?

Explosion-Scratch commented 1 year ago

This happens to me as well

mxyng commented 1 year ago

There are a lot of stability improvements in the upcoming release which should address this and other Post "http://127.0.0.1:11434/api/generate": EOF issues

Explosion-Scratch commented 1 year ago

@mxyng Will this allow us to run 7B models on Mac M1 or just show more informative error messages?

mxyng commented 1 year ago

We've tested llama2:7b with those changes on 8GB RAM without issue.

tmjoen commented 1 year ago

Any planned ETA for 0.0.13 or just "when it's done"?

mxyng commented 1 year ago

soon™️. Jokes aside, we're working out some minor bugs. It'll be released once those have been resolved.

BruceMacD commented 1 year ago

This should be more stable as of v0.0.13's release. I'll open this again in the case of more reports.

ollama / ollama

Error : Post /api/generate :EOF #241