Sequence of events observed through RUST_LOG=trace
...
...
Loaded tensor 360/363
...
Model size = 13152.46 MB / num tensors = 363`
TRACE llm_base::inference_session > Starting inference request with max_token_count: 1844674407370955161
// logging callback for llm::InferenceResponse::PromptToken(t) I do see the tokenised strings
TRACE llm_base::inference_session > Finished feed prompt
SamplerFailure(NoToken) // sampler throws an error - using `llm_samplers`
If I test with llm::InferenceParameters::default() (basically without specifying the sampler) it seems to just stall at this stage, so basically after the feed prompt no new tokens are being generated.
Looking at the current repo realised that llama.cpp submodule (crates/ggml/sys/llama-cpp) is at commit 8183159. So I checked out that commit in llama.cpp and generated fresh ggml file from Meta's LLaMA2-13b-chat weights for q8_0 (q8_0 with same codebase had worked before for a 7B model a couple of weeks back). Same behaviour can be observed.
I know because of upstream changes to GGUF this effort is kind of broken as of now and might be a few days/ weeks to stabilise, is there any way I can debug and try to figure out whats going on?
Any tips/ directions to get this up and running in the mean time would be super helpful.
Here are the particular details that have failed:
System Mac M1 Pro 16G RAM 10 cores
llm
main
false
features = ["llama"]
metal
Sequence of events observed through
RUST_LOG=trace
If I test with
llm::InferenceParameters::default()
(basically without specifying thesampler
) it seems to just stall at this stage, so basically after the feed prompt no new tokens are being generated.Models:
llama.cpp
submodule (crates/ggml/sys/llama-cpp
) is at commit8183159
. So I checked out that commit inllama.cpp
and generated fresh ggml file from Meta's LLaMA2-13b-chat weights forq8_0
(q8_0
with same codebase had worked before for a 7B model a couple of weeks back). Same behaviour can be observed.I know because of upstream changes to
GGUF
this effort is kind of broken as of now and might be a few days/ weeks to stabilise, is there any way I can debug and try to figure out whats going on?Any tips/ directions to get this up and running in the mean time would be super helpful.