GGML_ASSERT: /Users/hydai/workspace/WasmEdge/plugins/wasi_nn/thirdparty/ggml/llama.cpp:5745: n_tokens <= n_batch

niranjanakella commented 6 months ago

Hello @hydai, I am running the local inference using the following parameters.

wasmedge --dir .:. \                                                                                                      ─╯
  --env stream_stdout=true \
  --env enable_log=true \
  --env ctx_size=8192 \
  --env n_predict=512 \
  --nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf \
  target/wasm32-wasi/release/niranjan-rust-wasmedge-llm.wasm default

And after few turns of conversation I am getting the following error:

GGML_ASSERT: /Users/hydai/workspace/WasmEdge/plugins/wasi_nn/thirdparty/ggml/llama.cpp:5745: n_tokens <= n_batch
[1]    60026 abort      wasmedge --dir .:. --env stream_stdout=true --env enable_log=true --env  --en

hydai commented 6 months ago

You may need to update the batch-size at the same time: https://github.com/second-state/WasmEdge-WASINN-examples/tree/master/wasmedge-ggml-llama-interactive#parameters

niranjanakella commented 6 months ago

Thank you @hydai

second-state / WasmEdge-WASINN-examples

GGML_ASSERT: /Users/hydai/workspace/WasmEdge/plugins/wasi_nn/thirdparty/ggml/llama.cpp:5745: n_tokens <= n_batch #70