langchain-core 0.2.2 released a standard field to store usage metadata returned from chat model responses, such as input / output token counts. AIMessage objects have a .usage_metadata attribute which can hold a UsageMetadata dict. For now it is only holding token counts. Standardizing this information makes it simpler to track in monitoring / observability platforms and similar applications.
Here we unpack usage metadata returned by the Bedrock API onto AIMessages generated by chat models.
There are at least two options for implementing this in a streaming context:
(Implemented here) Currently, Bedrock streams a final chunk containing usage data in "amazon-bedrock-invocationMetrics", which we ignore. These data appear standardized, at least for Anthropic and Mistral (I also checked Cohere and Llama3, but streaming for chat models does not work on either currently). We can emit an additional chunk containing these data. The advantage of this is that we may not have to implement any provider-specific processing. The disadvantage is that currently the final chunk contains a "stop_reason", and if users are assuming this is the final chunk, this could break workflows.
langchain-core 0.2.2 released a standard field to store usage metadata returned from chat model responses, such as input / output token counts. AIMessage objects have a
.usage_metadata
attribute which can hold a UsageMetadata dict. For now it is only holding token counts. Standardizing this information makes it simpler to track in monitoring / observability platforms and similar applications.Here we unpack usage metadata returned by the Bedrock API onto AIMessages generated by chat models.
There are at least two options for implementing this in a streaming context:
"amazon-bedrock-invocationMetrics"
, which we ignore. These data appear standardized, at least for Anthropic and Mistral (I also checked Cohere and Llama3, but streaming for chat models does not work on either currently). We can emit an additional chunk containing these data. The advantage of this is that we may not have to implement any provider-specific processing. The disadvantage is that currently the final chunk contains a"stop_reason"
, and if users are assuming this is the final chunk, this could break workflows.Before:
After: