langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.9k stars 14.89k forks source link

VectorStore.add_texts fails with iterator #26818

Open speleo3 opened 5 days ago

speleo3 commented 5 days ago

Checked other resources

Example Code

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.embeddings import FakeEmbeddings

vectorstore = InMemoryVectorStore(FakeEmbeddings(size=10))
ids = vectorstore.add_texts(iter(["foo", "bar"]))  # <-- use iter() here
assert len(ids) == 2

Error Message and Stack Trace (if applicable)

AssertionError

Description

The VectorStore.add_texts type annotation for text is Iterable[str]. But passing an iterator rather than a sequence is like passing an empty list.

https://github.com/langchain-ai/langchain/blob/408a930d559da2fd914f7b1184b099ec6bf30b25/libs/core/langchain_core/vectorstores/base.py#L60-L62

The fix is to replace texts with texts_ on line 104:

https://github.com/langchain-ai/langchain/blob/408a930d559da2fd914f7b1184b099ec6bf30b25/libs/core/langchain_core/vectorstores/base.py#L104

System Info

System Information

OS: Linux OS Version: #1 SMP Fri Mar 29 23:14:13 UTC 2024 Python Version: 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0]

Package Information

langchain_core: 0.3.1 langchain: 0.3.0 langchain_community: 0.3.0 langsmith: 0.1.121 langchain_chroma: 0.1.4 langchain_huggingface: 0.1.0 langchain_openai: 0.2.0 langchain_text_splitters: 0.3.0 langchain_unstructured: 0.1.4

Optional packages not installed

langgraph langserve

Other Dependencies

aiohttp: 3.10.5 async-timeout: 4.0.3 chromadb: 0.5.3 dataclasses-json: 0.6.7 fastapi: 0.112.4 httpx: 0.27.2 huggingface-hub: 0.24.7 jsonpatch: 1.33 numpy: 1.26.4 openai: 1.46.0 orjson: 3.10.7 packaging: 23.2 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 sentence-transformers: 3.1.0 SQLAlchemy: 2.0.35 tenacity: 8.5.0 tiktoken: 0.7.0 tokenizers: 0.19.1 transformers: 4.44.2 typing-extensions: 4.12.2 unstructured-client: 0.25.9 unstructured[all-docs]: Installed. No version info available.

eyurtsev commented 5 days ago

Use a list please. we haven't updated the type hints properly yet, but it just doesn't make sense passing the pagination logic to the implementation in this case

eyurtsev commented 5 days ago

The correct fix here is to update the type signature for add_texts through out the entire code base to force users to use Sequence for texts, metadata, ids and handle the pagination properly so pagination is not done on the implementation side.