langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.31k stars 15.46k forks source link

langchain-chroma== 0.1.4 method get_by_ids is listed in documentation BUT I am getting NotImplementedError #28276

Open punsoca opened 4 days ago

punsoca commented 4 days ago

Checked other resources

Example Code

#----------------
# HuggingFace embedding  (no issue)
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model="sentence-transformers/all-mpnet-base-v2")

#----------------
# create langchain-chroma persistent client with collection name 'example_collection;  (no issue)
from langchain_chroma import Chroma

vector_store = Chroma(
    collection_name="example_collection",   # collection is "table" in vectore store 
    embedding_function=hf,    # hf is huggingface embeddings derived  from the previous step 
    persist_directory="./vectorstore/chroma_langchain_db",  # Where to save data locally, remove if not necessary
)

#----------------
# add at least one document into  vector collection (no issue)
from uuid import uuid4
from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
    id=1,
)

documents = [
    document_1,
]

uuids = [str(uuid4()) for _ in range(len(documents))]
vector_store.add_documents(documents=documents, ids=uuids)

#----------------  ERROR ENCOUNTERED when running get_by_ids 
# attempt to run get_by_Ids yields NotImplementedError
vector_store.get_by_ids(['6314982d-455f-47cc-bf97-6e5324f6af62'])

Error Message and Stack Trace (if applicable)

{ "name": "NotImplementedError", "message": "Chroma does not yet support get_by_ids.", "stack": "--------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[87], line 3 1 # testing get the first two document ids 2 # ids = ['db1e5f74-f18d-4765-a193-d30eaed7552f', '12861b34-df54-4e40-8e1e-ae9ea901d378'] ----> 3 vector_store.get_by_ids(['6314982d-455f-47cc-bf97-6e5324f6af62']) 5 # get_by_ids() functionality is not avaiable until v0.2.11

File ~/Documents/0_-_Python_Projects/05_Gen_AI/venv_3_11/lib/python3.11/site-packages/langchain_core/vectorstores/base.py:164, in VectorStore.get_by_ids(self, ids) 140 \"\"\"Get documents by their IDs. 141 142 The returned documents are expected to have the ID field set to the ID of the (...) 161 .. versionadded:: 0.2.11 162 \"\"\" 163 msg = f\"{self.class.name} does not yet support get_by_ids.\" --> 164 raise NotImplementedError(msg)

NotImplementedError: Chroma does not yet support get_by_ids." }

Description

I am just trying to run the vector_store method get_by_ids - it is listed as one of the available methods in here

System Info

$ python -m langchain_core.sys_info

System Information

OS: Darwin OS Version: Darwin Kernel Version 23.6.0: Mon Jul 29 21:13:00 PDT 2024; root:xnu-10063.141.2~1/RELEASE_X86_64 Python Version: 3.11.10 (main, Nov 19 2024, 15:24:32) [Clang 12.0.0 (clang-1200.0.32.29)]

Package Information

langchain_core: 0.3.19 langchain: 0.3.7 langchain_community: 0.3.4 langsmith: 0.1.143 langchain_chroma: 0.1.4 langchain_experimental: 0.3.3 langchain_groq: 0.2.1 langchain_huggingface: 0.1.2 langchain_text_splitters: 0.3.2

Optional packages not installed

langgraph langserve

Other Dependencies

aiohttp: 3.11.6 async-timeout: Installed. No version info available. chromadb: 0.5.20 dataclasses-json: 0.6.7 fastapi: 0.115.5 groq: 0.12.0 httpx: 0.27.2 httpx-sse: 0.4.0 huggingface-hub: 0.26.2 jsonpatch: 1.33 numpy: 1.26.4 orjson: 3.10.11 packaging: 24.2 pydantic: 2.9.2 pydantic-settings: 2.6.1 PyYAML: 6.0.2 requests: 2.32.3 requests-toolbelt: 1.0.0 sentence-transformers: 3.3.1 SQLAlchemy: 2.0.36 tenacity: 9.0.0 tokenizers: 0.20.3 transformers: 4.46.3 typing-extensions: 4.12.2

keenborder786 commented 3 days ago

It's currently not supported despite what the documentation said.

Fernando7181 commented 2 days ago

im using a vector database with chroma and seems to be working just fine, maybe we could help each other but im ingesting the documents first to the db and then pulling the entire db to get the information

punsoca commented 23 hours ago

Hi thank you for sharing. I received another email response saying the get_by_ids isn’t currently available right under langchain_chroma.

On Sat, Nov 23, 2024 at 5:17 AM Fernando Rodrigues @.***> wrote:

im using a vector database with chroma and seems to be working just fine, maybe we could help each other but im ingesting the documents first to the db and then pulling the entire db to get the information

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/28276#issuecomment-2495477113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUHC5GYUDSYUSXQLC3IOLTT2CB57JAVCNFSM6AAAAABSIIRGF2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOJVGQ3TOMJRGM . You are receiving this because you authored the thread.Message ID: @.***>

Fernando7181 commented 14 hours ago

that's interesting. im not polling from ids and most the intire vector data base maybe could be that?