utilityai / llama-cpp-rs

143 stars 43 forks source link

Faster Embeddings #284

Open srv1n opened 4 months ago

srv1n commented 4 months ago

Great crate!

I was able to speed up embeddings by making the following changes -

  1. expose n_ubatch
  2. setting n_ubatch and n_batch to 2048
  3. initialize llamabatch with n-tokens with 2048
  4. updating line 65 to check on n_batch size instead of n_ctx. (Details below)

Line 65 - if (batch.n_tokens() as usize + tokens.len()) > n_ctx {

this needs to be n_batch & not n_ctx ( you can refer to the original llama example - https://github.com/ggerganov/llama.cpp/blob/master/examples/embedding/embedding.cpp (line 164) - if (batch.n_tokens + n_toks > n_batch) {

MarcusDunn commented 4 months ago

Thanks for the issue, would love a PR to this effect.