Open srv1n opened 4 months ago
Great crate!
I was able to speed up embeddings by making the following changes -
Line 65 - if (batch.n_tokens() as usize + tokens.len()) > n_ctx {
this needs to be n_batch & not n_ctx ( you can refer to the original llama example - https://github.com/ggerganov/llama.cpp/blob/master/examples/embedding/embedding.cpp (line 164) - if (batch.n_tokens + n_toks > n_batch) {
Thanks for the issue, would love a PR to this effect.
Great crate!
I was able to speed up embeddings by making the following changes -
Line 65 - if (batch.n_tokens() as usize + tokens.len()) > n_ctx {
this needs to be n_batch & not n_ctx ( you can refer to the original llama example - https://github.com/ggerganov/llama.cpp/blob/master/examples/embedding/embedding.cpp (line 164) - if (batch.n_tokens + n_toks > n_batch) {