rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
https://docs.rs/llm/latest/llm/
Apache License 2.0
6.06k stars 350 forks source link

LLaMA-2 GGML formats fail to generate any new token #413

Open AnubhabB opened 10 months ago

AnubhabB commented 10 months ago

Here are the particular details that have failed:

System Mac M1 Pro 16G RAM 10 cores

llm

Sequence of events observed through RUST_LOG=trace

...
...
Loaded tensor 360/363
...
Model size = 13152.46 MB / num tensors = 363`
TRACE llm_base::inference_session                   > Starting inference request with max_token_count: 1844674407370955161
// logging callback for llm::InferenceResponse::PromptToken(t) I do see the tokenised strings
TRACE llm_base::inference_session                   > Finished feed prompt
SamplerFailure(NoToken) // sampler throws an error - using `llm_samplers`

If I test with llm::InferenceParameters::default() (basically without specifying the sampler) it seems to just stall at this stage, so basically after the feed prompt no new tokens are being generated.

Models:

I know because of upstream changes to GGUF this effort is kind of broken as of now and might be a few days/ weeks to stabilise, is there any way I can debug and try to figure out whats going on?

Any tips/ directions to get this up and running in the mean time would be super helpful.