LlamaCppEmbeddings not working with faiss

System Info

Name: langchain Version: 0.0.251

Name: faiss-cpu Version: 1.7.1

Name: llama-cpp-python Version: 0.1.77

Who can help?

No response

Information

[X] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[X] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[ ] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import gradio as gr
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import DirectoryLoader
from langchain.document_loaders import UnstructuredWordDocumentLoader
from torch import cuda, bfloat16
from transformers import StoppingCriteria, StoppingCriteriaList
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings import LlamaCppEmbeddings

template = """Question: {question}

Answer:"""

prompt = PromptTemplate(template=template, input_variables=["question"])
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

n_gpu_layers = 42  # Change this value based on your model and your GPU VRAM pool.
n_batch = 1024  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

embeddings = ubuntu(model_path="llama-2-7b-chat/7B/ggml-model-q4_0.bin",
                                n_gpu_layers=n_gpu_layers,
                                n_batch=n_batch)

llm = LlamaCpp(
    model_path="llama-2-7b-chat/7B/ggml-model-q4_0.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    callback_manager=callback_manager,
    verbose=True,
)

llm_chain = LLMChain(prompt=prompt, llm=llm)

txt_loader = DirectoryLoader("doc", glob="./*.docx", loader_cls=UnstructuredWordDocumentLoader)
documents = txt_loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)

vectorstore = FAISS.from_documents(all_splits, embeddings)

query = "How is it going?"
search = vectorstore.similarity_search(query, k=5)

template = '''Context: {context}

Based on Context provide me answer for following question
Question: {question}

Tell me the information about the fact. The answer should be from context only
do not use general knowledge to answer the query'''

prompt = PromptTemplate(input_variables=["context", "question"], template= template)
final_prompt = prompt.format(question=query, context=search)
result =  llm_chain.run(final_prompt)
print(result)

I get error:

llama_tokenize_with_model: too many tokens
Traceback (most recent call last):
  File "test_ggml.py", line 57, in <module>
    vectorstore = FAISS.from_documents(all_splits, embeddings)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/langchain/vectorstores/base.py", line 420, in from_documents
    return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/langchain/vectorstores/faiss.py", line 577, in from_texts
    embeddings = embedding.embed_documents(texts)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/langchain/embeddings/llamacpp.py", line 110, in embed_documents
    embeddings = [self.client.embed(text) for text in texts]
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/langchain/embeddings/llamacpp.py", line 110, in <listcomp>
    embeddings = [self.client.embed(text) for text in texts]
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/llama_cpp/llama.py", line 812, in embed
    return list(map(float, self.create_embedding(input)["data"][0]["embedding"]))
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/llama_cpp/llama.py", line 776, in create_embedding
    self.eval(tokens)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/llama_cpp/llama.py", line 471, in eval
    self.input_ids[self.n_tokens : self.n_tokens + n_tokens] = batch
ValueError: could not broadcast input array from shape (179,) into shape (0,)
Exception ignored in: <function Llama.__del__ at 0x7f68eedc9af0>
Traceback (most recent call last):
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/llama_cpp/llama.py", line 1508, in __del__
TypeError: 'NoneType' object is not callable
Exception ignored in: <function Llama.__del__ at 0x7f68eedc9af0>
Traceback (most recent call last):
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/llama_cpp/llama.py", line 1508, in __del__
TypeError: 'NoneType' object is not callable

Expected behavior

CppEmbedding can work well with faiss

Answer generated by a 🤖

Answer

Hello,

Thank you for reaching out and providing detailed information about your issue. Based on the error message you're encountering, it seems like the LlamaCppEmbeddings is trying to tokenize a text that exceeds its maximum token limit. The maximum token limit for the LlamaCppEmbeddings in LangChain is defined by the n_ctx parameter, which is set to 512 by default.

Here are a few suggestions that might help you resolve this issue:

Check the length of your documents: Ensure that none of your documents exceed the maximum token limit of the LlamaCppEmbeddings. You can do this by tokenizing your documents and checking their length before passing them to the embed_documents method.
Use a different embedding model: As suggested in a similar issue #8420, you could try using the GPT4AllEmbeddings instead of the LlamaCppEmbeddings. Here's how you can modify your code to do this:

from langchain.embeddings import GPT4AllEmbeddings

# Replace LlamaCppEmbeddings with GPT4AllEmbeddings
embeddings = GPT4AllEmbeddings()

# Rest of your code remains the same

Please note that you'll need to install the gpt4all package before you can use GPT4AllEmbeddings. You can do this by running pip install gpt4all.

Update your llama-cpp-python package: Another similar issue #2381 suggests that updating the llama-cpp-python package might resolve the issue. You can do this by running pip install --upgrade llama-cpp-python.

Please try these suggestions and let me know if they help resolve your issue. If not, please provide more information about your documents (e.g., their average length, the maximum length, etc.) and I'll be happy to assist you further.

Best, Dosu

Sources

libs/langchain/langchain/embeddings/llamacpp.py

This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

langchain-ai / langchain