langchain-ai / langchain-aws

Build LangChain Applications on AWS
MIT License
63 stars 47 forks source link

Adds Bedrock KB metadata to AmazonKnowledgeBasesRetriever Document metadata #36

Closed karlnewell closed 2 months ago

karlnewell commented 2 months ago

AWS Bedrock Knowledge Bases support custom user metadata per source document. This change adds that metadata to AmazonKnowledgeBasesRetriever Documents metadata as source_metadata.

This addresses the feature request in https://github.com/langchain-ai/langchain/discussions/20649

I'm open to suggestions on the source_metadata key name, but I was avoiding metadata["metadata"]

There are a few different ways to handle this. In this PR, source_metadata is set to None if the Bedrock KB result does not contain a metadata key. I prefer this approach as the presence of the source_metadata key is guaranteed for the caller - i.e., the caller does not have to check for it's presence.

If it's preferred to not set source_metadata if it's empty, then here are a couple of options.

for result in results:
    metadata={
        "location": result["location"],
        "score": result["score"] if "score" in result else 0,
    }
    if "metadata" in result:
        metadata["source_metadata"] = result["metadata"]
    documents.append(
        Document(
            page_content=result["content"]["text"],
            metadata=metadata,
        )
    )

Requires Python 3.9+

for result in results:
    documents.append(
        Document(
            page_content=result["content"]["text"],
            metadata={
                "location": result["location"],
                "score": result["score"] if "score" in result else 0} |
                ({'source_metadata': result["metadata"] } if "metadata" in result else {}),
        )
    )

I updated the integration tests to reflect these changes and added a response that includes KB metadata.

I changed the integration test get_relevant_documents to invoke based on

/home/knewell/.cache/pypoetry/virtualenvs/langchain-aws-eO7Ij7mw-py3.9/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method BaseRetriever.get_relevant_documents was deprecated in langchain-core 0.1.46 and will be removed in 0.3.0. Use invoke instead. warn_deprecated(

Two simple typos in README.md were fixed.

3coins commented 2 months ago

@alexol91 Do you have any feedback for this PR?

karlnewell commented 2 months ago

Same feature added by https://github.com/langchain-ai/langchain-aws/pull/40