llama.cpp ./embedding - Githubissues

mdrokz / rust-llama.cpp

LLama.cpp rust bindings

https://crates.io/crates/llama_cpp_rs/

MIT License

290 stars 42 forks source link

llama.cpp ./embedding #17

Open Philipp-Sc opened 8 months ago

Philipp-Sc commented 8 months ago

Is there no rust binding to get the embeddings?

Using llama.cpp one would use:

./embedding -m ./path/to/model --log-disable -p "Hello World!" 2>/dev/null

mdrokz commented 8 months ago

Is there no rust binding to get the embeddings?

Using llama.cpp one would use:
./embedding -m ./path/to/model --log-disable -p "Hello World!" 2>/dev/null

have you tried this https://github.com/mdrokz/rust-llama.cpp/blob/baa1bcff5ed03c923027e758e9363174d77e8900/src/lib.rs#L287 method ?

this will get you the embeddings for the prompt

Philipp-Sc commented 8 months ago

@mdrokz thank you for your response.

I tried it before trying the direct llama.cpp ./embedding executable.

The function would always return an empty vector:

[]

I tried multiple configurations but could not fix the issue.

mdrokz commented 8 months ago

@mdrokz thank you for your response.

I tried it before trying the direct llama.cpp ./embedding executable.

The function would always return an empty vector:
[]
I tried multiple configurations but could not fix the issue.

Alright i will test on my end see whats happening. Thanks

Philipp-Sc commented 8 months ago

I was using zephyr-7B-alpha-GGUF with:

context_size: 8192 
n_batch: 512 
embeddings: true

without any GPU assistance.

Note: There also was some strange behavior regarding n_batch and n_token where a longer prompt (still way below the context length) lead to an unexpected error:

GGML_ASSERT: n_token <= n_batch

Right now my workaround is to use a rust wrapper (Command::new) around the ./embedding binary and parse the float values from stdout back into a string, then vector. The only parameter I set is --ctx-size 8192 and --mlock. I imagine this is less efficient as the model needs to be reloaded for each call.