langchain-ai / langchain-aws

Build LangChain Applications on AWS
MIT License
95 stars 70 forks source link

ChatBedrock: add usage metadata #85

Closed ccurme closed 3 months ago

ccurme commented 3 months ago

langchain-core 0.2.2 released a standard field to store usage metadata returned from chat model responses, such as input / output token counts. AIMessage objects have a .usage_metadata attribute which can hold a UsageMetadata dict. For now it is only holding token counts. Standardizing this information makes it simpler to track in monitoring / observability platforms and similar applications.

Here we unpack usage metadata returned by the Bedrock API onto AIMessages generated by chat models.

There are at least two options for implementing this in a streaming context:

  1. (Implemented here) Currently, Bedrock streams a final chunk containing usage data in "amazon-bedrock-invocationMetrics", which we ignore. These data appear standardized, at least for Anthropic and Mistral (I also checked Cohere and Llama3, but streaming for chat models does not work on either currently). We can emit an additional chunk containing these data. The advantage of this is that we may not have to implement any provider-specific processing. The disadvantage is that currently the final chunk contains a "stop_reason", and if users are assuming this is the final chunk, this could break workflows.

Before:

content='' response_metadata={'usage': {'input_tokens': [8], 'output_tokens': [1]}}
content='Hello' response_metadata={'stop_reason': None}
content='!' response_metadata={'stop_reason': None}
content='' response_metadata={'stop_reason': 'max_tokens', 'usage': {'output_tokens': [2]}}

After:

content='' response_metadata={'usage': {'input_tokens': [8], 'output_tokens': [1]}}
content='Hello' response_metadata={'stop_reason': None}
content='!' response_metadata={'stop_reason': None}
content='' response_metadata={'stop_reason': 'max_tokens', 'usage': {'output_tokens': [2]}}
content='' usage_metadata={'input_tokens': 8, 'output_tokens': 2, 'total_tokens': 10}
  1. (Implemented in commit history) Implement provider-specific processing, specifically for Anthropic. This is what I did first. Commit https://github.com/langchain-ai/langchain-aws/pull/85/commits/2b9e4003bee8c57f974cf8dd7a8618a534e2d2c5 changes to option 1, and if we want we can revert that commit.