Open sudeepjd opened 1 month ago
Another more possible fix is to flatten the embeddings which are at the 3rd level into a single list at the second level.
Change the line 114
return [list(map(float, sublist)) for e in embeddings for sublist in e]
As stated in the ticket, I do not know how flattening the embeddings will impact the performance or the quality of the search and retrieval, so will leave that to an expert to decide :-)
Thanks for your support.
got the same error, so each text is converted into a list of lists via embedding? Hope to get the official solution!
Confirmed, looks like llama-cpp-python returns list of vectors (each per token) insted of just one vector. UPD: Found the reason and solution https://github.com/abetlen/llama-cpp-python/issues/1288#issuecomment-2123475326
Also check docs about embeddings in llama-cpp-python.
There are two primary notions of embeddings in a Transformer-style model: token level and sequence level. Sequence level embeddings are produced by "pooling" token level embeddings together, usually by averaging them or using the first token.
Also got this error
Any update on this case? also got this error.
Checked other resources
Example Code
The CodeLlama model that I am using can be downloaded from huggingface here : https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q3_K_M.gguf?download=true
Error Message and Stack Trace (if applicable)
Traceback (most recent call last): File "D:\Projects\GenAI_CodeDocs\01-Code\03_embed.py", line 6, in
embeddings = llama_embed.embed_documents(texts)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\GenAI_CodeDocs\00-VENV\code_doc\Lib\site-packages\langchain_community\embeddings\llamacpp.py", line 114, in embed_documents
return [list(map(float, e)) for e in embeddings]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Projects\GenAI_CodeDocs\00-VENV\code_doc\Lib\site-packages\langchain_community\embeddings\llamacpp.py", line 114, in
return [list(map(float, e)) for e in embeddings]
^^^^^^^^^^^^^^^^^^^
TypeError: float() argument must be a string or a real number, not 'list'
Description
The embeddings produced at line:
https://github.com/langchain-ai/langchain/blob/58192d617f0e7b21ac175f869068324128949504/libs/community/langchain_community/embeddings/llamacpp.py#L113
Gives me a list of list of lists i.e., 3 lists down as below and the embeddings are on the 3rd list down.
https://github.com/langchain-ai/langchain/blob/58192d617f0e7b21ac175f869068324128949504/libs/community/langchain_community/embeddings/llamacpp.py#L114
evaluates the list at 2 lists down [ list(map(float, e (List2) )) for e (List2) in embeddings (List1) ]
and since the elements of List2 is a list, we get the error.
TypeError: float() argument must be a string or a real number, not 'list'
Changing the line 114 to
fixes the error, but I do not know the impact it would cause on the rest of the system.
Thank you for looking into the issue.
System Info
System Information
Package Information
Packages not installed (Not Necessarily a Problem)
The following packages were not found: