langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.2k stars 15.44k forks source link

aindex + Chroma: aindex function fails due to missing delete method implementation #27225

Open ArtemMartus opened 1 month ago

ArtemMartus commented 1 month ago

Checked other resources

Example Code

from langchain.indexes import SQLRecordManager, aindex
from langchain_community.document_loaders.generic import GenericLoader
from langchain_community.document_loaders.parsers.language.language_parser import LanguageParser
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

loader = GenericLoader.from_filesystem(
    dir_path,
    suffixes=[".py"],
    parser=LanguageParser(None, 10),
    show_progress=True,
)
vector_store = Chroma(
    collection_name=collection_name,
    embedding_function=OllamaEmbeddings(model="nomic-embed-text"),
    persist_directory="./chroma_cache",
)
namespace = f"chromadb/{collection_name}"

record_manager = SQLRecordManager(
    namespace,
    db_url="sqlite:///record_manager_cache.sqlite",
)

async def main():
    await record_manager.create_schema()
    results = await aindex(
        docs_source=loader,
        record_manager=record_manager,
        vector_store=vector_store,
        cleanup="incremental",
        source_id_key="source",
    )

    print("done indexing")
    print(results)
    print("-----------------")

asyncio.run(main())

Error Message and Stack Trace (if applicable)

File "/home/artem/.local/lib/python3.12/site-packages/langchain_core/indexing/api.py", line 521, in aindex raise ValueError("Vectorstore has not implemented the delete method") ValueError: Vectorstore has not implemented the delete method

From langchain_core/indexing/api.py

# If it's a vectorstore, let's check if it has the required methods.
    if isinstance(destination, VectorStore):
        # Check that the Vectorstore has required methods implemented
        # Check that the Vectorstore has required methods implemented
        methods = ["adelete", "aadd_documents"]

        for method in methods:
            if not hasattr(destination, method):
                raise ValueError(
                    f"Vectorstore {destination} does not have required method {method}"
                )

        if type(destination).adelete == VectorStore.adelete:
            # Checking if the vectorstore has overridden the default delete method
            # implementation which just raises a NotImplementedError
            raise ValueError("Vectorstore has not implemented the delete method")

Description

I am trying to follow the documentation examples but it seems that aindex function is broken. Can someone tell about the correctness of the following check:

  if type(destination).adelete == VectorStore.adelete:
            # Checking if the vectorstore has overridden the default delete method
            # implementation which just raises a NotImplementedError
            raise ValueError("Vectorstore has not implemented the delete method")

How is that supposed to work with langchain_chroma ? It works just fine with index and doesn't want to work with aindex

System Info

$ python -m langchain_core.sys_info

System Information

OS: Linux OS Version: #45-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 30 12:02:04 UTC 2024 Python Version: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]

Package Information

langchain_core: 0.3.6 langchain: 0.3.1 langchain_community: 0.3.1 langsmith: 0.1.128 langchain_chroma: 0.1.4 langchain_ollama: 0.2.0 langchain_text_splitters: 0.3.0 langgraph: 0.2.34 langserve: 0.3.0

Other Dependencies

aiohttp: 3.10.5 async-timeout: 4.0.3 chromadb: 0.5.11 dataclasses-json: 0.6.7 fastapi: 0.115.0 httpx: 0.27.2 jsonpatch: 1.33 langgraph-checkpoint: 2.0.1 numpy: 1.26.4 ollama: 0.3.3 orjson: 3.10.7 packaging: 24.1 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 SQLAlchemy: 2.0.35 sse-starlette: 1.8.2 tenacity: 8.5.0 typing-extensions: 4.12.2

eyurtsev commented 1 month ago

If you're triggering this condition:

        if type(destination).adelete == VectorStore.adelete:
            # Checking if the vectorstore has overridden the default delete method
            # implementation which just raises a NotImplementedError
            raise ValueError("Vectorstore has not implemented the delete method")

It looks like it's b/c the underlying vecotrstore has not implemented an async delete. This is a bug/missing feature in the vectorstore not in the aindex function

lucasile commented 1 month ago

Currently, Chroma implements a synchronous delete method as shown:

class Chroma(VectorStore):
    def delete(self, ids: Optional[List[str]] = None, **kwargs: Any) -> None:
        """Delete by vector IDs.

        Args:
            ids: List of ids to delete.
            kwargs: Additional keyword arguments.
        """
        self._collection.delete(ids=ids, **kwargs)

Upon closer inspection, the Chroma vector store doesn't provide any asynchronous functionality, even though its docstring suggests otherwise. However, Chroma offers an asynchronous ClientApi that could be leveraged to implement this.

We’re interested in investigating this further and submitting a PR to address it.