Open Philipp-Sc opened 8 months ago
Is there no rust binding to get the embeddings?
Using llama.cpp one would use:
./embedding -m ./path/to/model --log-disable -p "Hello World!" 2>/dev/null
have you tried this https://github.com/mdrokz/rust-llama.cpp/blob/baa1bcff5ed03c923027e758e9363174d77e8900/src/lib.rs#L287 method ?
this will get you the embeddings for the prompt
@mdrokz thank you for your response.
I tried it before trying the direct llama.cpp ./embedding executable.
The function would always return an empty vector:
[]
I tried multiple configurations but could not fix the issue.
@mdrokz thank you for your response.
I tried it before trying the direct llama.cpp ./embedding executable.
The function would always return an empty vector:
[]
I tried multiple configurations but could not fix the issue.
Alright i will test on my end see whats happening. Thanks
I was using zephyr-7B-alpha-GGUF with:
context_size: 8192
n_batch: 512
embeddings: true
without any GPU assistance.
Note:
There also was some strange behavior regarding n_batch
and n_token
where a longer prompt (still way below the context length) lead to an unexpected error:
GGML_ASSERT: n_token <= n_batch
Right now my workaround is to use a rust wrapper (Command::new
) around the ./embedding binary and parse the float values from stdout
back into a string, then vector. The only parameter I set is --ctx-size 8192
and --mlock
.
I imagine this is less efficient as the model needs to be reloaded for each call.
Is there no rust binding to get the embeddings?
Using llama.cpp one would use: