langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
84.83k stars 13.11k forks source link

Langchain DocArrayInMemorySearch not working #20957

Open anthonytecsa opened 2 weeks ago

anthonytecsa commented 2 weeks ago

Checked other resources

Example Code

from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.vectorstores import DocArrayInMemorySearch
from langchain_community.document_loaders import TextLoader
import tempfile
import whisper
from pytube import YouTube

# Let's do this only if we haven't created the transcription file yet.
if not os.path.exists("transcription.txt"):
    youtube = YouTube(YOUTUBE_VIDEO)
    audio = youtube.streams.filter(only_audio=True).first()

    # Let's load the base model. This is not the most accurate model but it's fast.
    whisper_model = whisper.load_model("base")

    with tempfile.TemporaryDirectory() as tmpdir:
        file = audio.download(output_path=tmpdir)
        transcription = whisper_model.transcribe(file, fp16=False)["text"].strip()

        with open("transcription.txt", "w") as file:
            file.write(transcription)

documents = TextLoader("transcription.txt").load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

db = DocArrayInMemorySearch.from_documents(docs, embeddings)

Error Message and Stack Trace (if applicable)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[27], [line 11](vscode-notebook-cell:?execution_count=27&line=11)
      [7](vscode-notebook-cell:?execution_count=27&line=7) docs = text_splitter.split_documents(documents)
      [9](vscode-notebook-cell:?execution_count=27&line=9) embeddings = OpenAIEmbeddings()
---> [11](vscode-notebook-cell:?execution_count=27&line=11) db = DocArrayInMemorySearch.from_documents(docs, embeddings)

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\langchain_core\vectorstores.py:550](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_core/vectorstores.py:550), in VectorStore.from_documents(cls, documents, embedding, **kwargs)
    [548](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_core/vectorstores.py:548) texts = [d.page_content for d in documents]
    [549](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_core/vectorstores.py:549) metadatas = [d.metadata for d in documents]
--> [550](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_core/vectorstores.py:550) return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\langchain_community\vectorstores\docarray\in_memory.py:68](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:68), in DocArrayInMemorySearch.from_texts(cls, texts, embedding, metadatas, **kwargs)
     [46](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:46) @classmethod
     [47](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:47) def from_texts(
     [48](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:48)     cls,
   (...)
     [52](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:52)     **kwargs: Any,
     [53](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:53) ) -> DocArrayInMemorySearch:
     [54](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:54)     """Create an DocArrayInMemorySearch store and insert data.
     [55](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:55) 
     [56](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:56)     Args:
   (...)
     [66](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:66)         DocArrayInMemorySearch Vector Store
     [67](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:67)     """
---> [68](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:68)     store = cls.from_params(embedding, **kwargs)
     [69](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:69)     store.add_texts(texts=texts, metadatas=metadatas)
     [70](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:70)     return store

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\langchain_community\vectorstores\docarray\in_memory.py:39](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:39), in DocArrayInMemorySearch.from_params(cls, embedding, metric, **kwargs)
     [21](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:21) @classmethod
     [22](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:22) def from_params(
     [23](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:23)     cls,
   (...)
     [28](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:28)     **kwargs: Any,
     [29](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:29) ) -> DocArrayInMemorySearch:
     [30](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:30)     """Initialize DocArrayInMemorySearch store.
     [31](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:31) 
     [32](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:32)     Args:
   (...)
     [37](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:37)         **kwargs: Other keyword arguments to be passed to the get_doc_cls method.
     [38](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:38)     """
---> [39](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:39)     _check_docarray_import()
     [40](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:40)     from docarray.index import InMemoryExactNNIndex
     [42](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/in_memory.py:42)     doc_cls = cls._get_doc_cls(space=metric, **kwargs)

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\langchain_community\vectorstores\docarray\base.py:19](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/base.py:19), in _check_docarray_import()
     [17](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/base.py:17) def _check_docarray_import() -> None:
     [18](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/base.py:18)     try:
---> [19](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/base.py:19)         import docarray
     [21](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/base.py:21)         da_version = docarray.__version__.split(".")
     [22](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/langchain_community/vectorstores/docarray/base.py:22)         if int(da_version[0]) == 0 and int(da_version[1]) <= 31:

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\docarray\__init__.py:5](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/__init__.py:5)
      [1](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/__init__.py:1) __version__ = '0.32.1'
      [3](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/__init__.py:3) import logging
----> [5](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/__init__.py:5) from docarray.array import DocList, DocVec
      [6](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/__init__.py:6) from docarray.base_doc.doc import BaseDoc
      [7](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/__init__.py:7) from docarray.utils._internal.misc import _get_path_from_docarray_root_level

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\docarray\array\__init__.py:2](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/__init__.py:2)
      [1](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/__init__.py:1) from docarray.array.any_array import AnyDocArray
----> [2](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/__init__.py:2) from docarray.array.doc_list.doc_list import DocList
      [3](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/__init__.py:3) from docarray.array.doc_vec.doc_vec import DocVec
      [5](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/__init__.py:5) __all__ = ['DocList', 'DocVec', 'AnyDocArray']

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\docarray\array\doc_list\doc_list.py:44](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:44)
     [36](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:36) T = TypeVar('T', bound='DocList')
     [37](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:37) T_doc = TypeVar('T_doc', bound=BaseDoc)
     [40](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:40) class DocList(
     [41](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:41)     ListAdvancedIndexing[T_doc],
     [42](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:42)     PushPullMixin,
     [43](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:43)     IOMixinArray,
---> [44](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:44)     AnyDocArray[T_doc],
     [45](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:45) ):
     [46](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:46)     """
     [47](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:47)      DocList is a container of Documents.
     [48](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:48) 
   (...)
    [114](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:114) 
    [115](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:115)     """
    [117](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/doc_list/doc_list.py:117)     doc_type: Type[BaseDoc] = AnyDoc

File [c:\Users\astec\OneDrive\Documents\RAG_PROJECT\.venv\Lib\site-packages\docarray\array\any_array.py:46](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/any_array.py:46), in AnyDocArray.__class_getitem__(cls, item)
     [43](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/any_array.py:43) @classmethod
     [44](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/any_array.py:44) def __class_getitem__(cls, item: Union[Type[BaseDoc], TypeVar, str]):
     [45](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/any_array.py:45)     if not isinstance(item, type):
---> [46](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/any_array.py:46)         return Generic.__class_getitem__.__func__(cls, item)  # type: ignore
     [47](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/any_array.py:47)         # this do nothing that checking that item is valid type var or str
     [48](file:///C:/Users/astec/OneDrive/Documents/RAG_PROJECT/.venv/Lib/site-packages/docarray/array/any_array.py:48)     if not issubclass(item, BaseDoc):

AttributeError: 'builtin_function_or_method' object has no attribute '__func__'

Description

Im trying to use langchain's DocArrayInMemorySearch to create a vector database for my transcription text file, I've written code exactly as it is shown within the LangChain documentation but it does not work

System Info

System Information

OS: Windows OS Version: 10.0.22631 Python Version: 3.12.3 (tags/v3.12.3:f6650f9, Apr 9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)]

Package Information

langchain_core: 0.1.46 langchain: 0.1.16 langchain_community: 0.0.34 langsmith: 0.1.51 langchain_openai: 0.1.3 langchain_pinecone: 0.1.0 langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)

The following packages were not found:

langgraph langserve