langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.85k stars 15.12k forks source link

TypeSense API doesn't return more than 10 documents #27502

Open b5y opened 1 day ago

b5y commented 1 day ago

Checked other resources

Example Code

import typesense
from langchain_community.vectorstores import Typesense
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate

client_config = {
    "nodes": [{
        "host": "localhost",
        "port": "8108",
        "protocol": "http"
    }],
    "api_key": "xyz",
    "connection_timeout_seconds": 300
}

client = typesense.Client(client_config)

smg_embeddings = SagemakerEndpointEmbeddings(
        endpoint_name=endpoint_name,
        client=sagemaker_client,
        credentials_profile_name=credentials_profile_name,
        content_handler=ContentHandler()
    )

vectorstore = Typesense(
    typesense_client=client,
    typesense_collection_name="abc",
    embedding=smg_embeddings,
    text_key="qaz"
)

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 100,
        "fetch_k": 100
    }
)

Error Message and Stack Trace (if applicable)

No response

Description

When I call retriever from the chain, the maximum number of documents retrieved by TypeSense is 10, as stated in the documentation. I've tried to inherit TypeSense class by implementing my own CustomTypesense with custom similarity_search_with_score and similarity_search functions with default `k = 100´, but it didn't help.

I have tried to play with various arguments, such as search_kwargs arguments, when I call the as_retriever function from the VectorStore class (inherited in the Typesense class), which also didn't help.

Upgrading the longchain hasn't changed the situation.

I assume this can be considered a bug since no arguments work to specify the number of retrieved documents from Typesense.

System Info

System Information

OS: Linux OS Version: #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 16:16:55 UTC 2 Python Version: 3.11.10 | packaged by conda-forge | (main, Sep 30 2024, 18:08:57) [GCC 13.3.0]

Package Information

langchain_core: 0.3.12 langchain: 0.3.4 langchain_community: 0.3.2 langsmith: 0.1.135 langchain_aws: 0.2.2 langchain_chroma: 0.1.4 langchain_cli: 0.0.31 langchain_huggingface: 0.1.0 langchain_ollama: 0.2.0 langchain_text_splitters: 0.3.0 langgraph: 0.2.39 langserve: 0.3.0

Other Dependencies

aiohttp: 3.10.10 async-timeout: Installed. No version info available. boto3: 1.35.41 chromadb: 0.5.11 dataclasses-json: 0.6.7 fastapi: 0.115.2 gitpython: 3.1.43 gritql: 0.1.5 httpx: 0.27.2 huggingface-hub: 0.25.1 jsonpatch: 1.33 langgraph-checkpoint: 2.0.1 langgraph-sdk: 0.1.33 langserve[all]: Installed. No version info available. numpy: 1.26.4 ollama: 0.3.3 orjson: 3.10.7 packaging: 24.1 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 requests-toolbelt: 1.0.0 sentence-transformers: 3.0.1 SQLAlchemy: 2.0.36 sse-starlette: 1.8.2 tenacity: 8.5.0 tokenizers: 0.20.0 tomlkit: 0.12.5 transformers: 4.45.1 typer[all]: Installed. No version info available. typing-extensions: 4.12.2 uvicorn: 0.23.2

e-than-c commented 1 day ago

Hi @b5y, we're a group of students from the University of Toronto. Mind if we investigate this bug further?

b5y commented 10 hours ago

Hi @e-than-c!

Please feel free to investigate this bug.