TypeSense API doesn't return more than 10 documents

b5y commented 1 day ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import typesense
from langchain_community.vectorstores import Typesense
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder, PromptTemplate

client_config = {
    "nodes": [{
        "host": "localhost",
        "port": "8108",
        "protocol": "http"
    }],
    "api_key": "xyz",
    "connection_timeout_seconds": 300
}

client = typesense.Client(client_config)

smg_embeddings = SagemakerEndpointEmbeddings(
        endpoint_name=endpoint_name,
        client=sagemaker_client,
        credentials_profile_name=credentials_profile_name,
        content_handler=ContentHandler()
    )

vectorstore = Typesense(
    typesense_client=client,
    typesense_collection_name="abc",
    embedding=smg_embeddings,
    text_key="qaz"
)

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 100,
        "fetch_k": 100
    }
)

Error Message and Stack Trace (if applicable)

No response

Description

When I call retriever from the chain, the maximum number of documents retrieved by TypeSense is 10, as stated in the documentation. I've tried to inherit TypeSense class by implementing my own CustomTypesense with custom similarity_search_with_score and similarity_search functions with default `k = 100´, but it didn't help.

I have tried to play with various arguments, such as search_kwargs arguments, when I call the as_retriever function from the VectorStore class (inherited in the Typesense class), which also didn't help.

Upgrading the longchain hasn't changed the situation.

I assume this can be considered a bug since no arguments work to specify the number of retrieved documents from Typesense.

System Info

System Information

OS: Linux OS Version: #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 16:16:55 UTC 2 Python Version: 3.11.10 | packaged by conda-forge | (main, Sep 30 2024, 18:08:57) [GCC 13.3.0]

Package Information

langchain_core: 0.3.12 langchain: 0.3.4 langchain_community: 0.3.2 langsmith: 0.1.135 langchain_aws: 0.2.2 langchain_chroma: 0.1.4 langchain_cli: 0.0.31 langchain_huggingface: 0.1.0 langchain_ollama: 0.2.0 langchain_text_splitters: 0.3.0 langgraph: 0.2.39 langserve: 0.3.0

Other Dependencies

aiohttp: 3.10.10 async-timeout: Installed. No version info available. boto3: 1.35.41 chromadb: 0.5.11 dataclasses-json: 0.6.7 fastapi: 0.115.2 gitpython: 3.1.43 gritql: 0.1.5 httpx: 0.27.2 huggingface-hub: 0.25.1 jsonpatch: 1.33 langgraph-checkpoint: 2.0.1 langgraph-sdk: 0.1.33 langserve[all]: Installed. No version info available. numpy: 1.26.4 ollama: 0.3.3 orjson: 3.10.7 packaging: 24.1 pydantic: 2.9.2 pydantic-settings: 2.5.2 PyYAML: 6.0.2 requests: 2.32.3 requests-toolbelt: 1.0.0 sentence-transformers: 3.0.1 SQLAlchemy: 2.0.36 sse-starlette: 1.8.2 tenacity: 8.5.0 tokenizers: 0.20.0 tomlkit: 0.12.5 transformers: 4.45.1 typer[all]: Installed. No version info available. typing-extensions: 4.12.2 uvicorn: 0.23.2

e-than-c commented 1 day ago

Hi @b5y, we're a group of students from the University of Toronto. Mind if we investigate this bug further?

b5y commented 10 hours ago

Hi @e-than-c!

Please feel free to investigate this bug.

langchain-ai / langchain