Closed niranjanakella closed 6 months ago
Hello @hydai, I am running the local inference using the following parameters.
wasmedge --dir .:. \ ─╯ --env stream_stdout=true \ --env enable_log=true \ --env ctx_size=8192 \ --env n_predict=512 \ --nn-preload default:GGML:AUTO:llama-2-7b-chat.Q5_K_M.gguf \ target/wasm32-wasi/release/niranjan-rust-wasmedge-llm.wasm default
And after few turns of conversation I am getting the following error:
GGML_ASSERT: /Users/hydai/workspace/WasmEdge/plugins/wasi_nn/thirdparty/ggml/llama.cpp:5745: n_tokens <= n_batch [1] 60026 abort wasmedge --dir .:. --env stream_stdout=true --env enable_log=true --env --en
You may need to update the batch-size at the same time: https://github.com/second-state/WasmEdge-WASINN-examples/tree/master/wasmedge-ggml-llama-interactive#parameters
batch-size
Thank you @hydai
Hello @hydai, I am running the local inference using the following parameters.
And after few turns of conversation I am getting the following error: