henomis / lingoose

🪿 LinGoose is a Go framework for building awesome AI/LLM applications.
https://lingoose.io
MIT License
606 stars 49 forks source link

Parse embedding error, expected all numeric but string is prefixed with "embedding 0: #213

Open kbrisso opened 1 week ago

kbrisso commented 1 week ago

Describe the bug When using llama-embeddings.exe the embedding returns a string from the command out that is formatted like this "embedding 0:" 0.011059 -0.014784 -0.018492 0.020268 -0.027386 0.022915 " This part "embedding 0" is causing the issue.

To Reproduce Steps to reproduce the behavior: llama-embedding.exe -m mxbai-embed-large-v1-f16.gguf --pooling mean -p "Madam Speaker Madam Vice President our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. Last year COVID-19 kept us apart. This year we are finally together again. Tonight we meet as Democrats Republicans and Independents. But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an u" >>log.text

Expected behavior String with spaces with no alpha text, all text should numeric.

Screenshots Posted issues in Discord.

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

kbrisso commented 1 week ago

I have a local branch with a fix in it. I can create a pull if needed.

henomis commented 1 week ago

Hi @kbrisso, I'm not sure this issue is related to Lingoose, the project doesn't have a tool called llama-embeddings.exe

kbrisso commented 1 week ago

Hi @kbrisso, I'm not sure this issue is related to Lingoose, the project doesn't have a tool called llama-embeddings.exe

Here is the full code snippet. You call the llama.cpp embedder exe "llama-embedding.exe" with this method llamacppembedder.New().WithModel(......).WithLlamaCppPath(.......)

Screenshot 2024-09-07 115142

This is how I fixed it in your llamaccp.go file in your project.

Screenshot 2024-09-07 115716

henomis commented 5 days ago

Could you try without --embd-output-format and --embd-separator?

kbrisso commented 5 days ago

Could you try without --embd-output-format and --embd-separator?

I tried all the settings and none of them worked. I spent all weekend on this. Your code expects a perfect string that can be converted to a slice. If you review the llama.cpp code you will see it returns the string like below

"embedding 0:" 0.011059 -0.014784 -0.018492 0.020268 -0.027386 0.022915

henomis commented 5 days ago

Ok let's try the lingoose version v0.2.1-alpha.2 with the hotfix for that and remove --embd-output-format and --embd-separator args.

kbrisso commented 5 days ago

Ok let's try the lingoose version v0.2.1-alpha.2 with the hotfix for that and remove --embd-output-format and --embd-separator args.

Works great! Thanks!