run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.39k stars 4.98k forks source link

[Feature Request]: Bedrock keep track trace in query engine #15107

Closed mdciri closed 3 weeks ago

mdciri commented 1 month ago

Feature Description

I would like to keep the raw information when calling client.invoke_model(..., trace="ENABLED").

I created a a class MyBedrock, subcalss of Bedrock that in the method complete has:

response = self._client.invoke_model(
    body=request_body_str,
    modelId=self.model,
    accept='application/json',
    contentType='application/json',
    trace='ENABLED',
    guardrailIdentifier=AWS_GUARDRAIL_ID,
    guardrailVersion=AWS_GUARDRAIL_VERSION
)

So, when i do:

llm = MyBedrock(...)
res = llm.complete("who are you?")

I get:

CompletionResponse(text='\nI am Amazon Titan, a large language model built by AWS. It is designed to assist you with tasks and answer any questions you may have. How may I help you?', additional_kwargs={}, raw={'inputTextTokenCount': 4, 'results': [{'tokenCount': 37, 'outputText': '\nI am Amazon Titan, a large language model built by AWS. It is designed to assist you with tasks and answer any questions you may have. How may I help you?', 'completionReason': 'FINISH'}], 'amazon-bedrock-trace': {'guardrail': {}}, 'amazon-bedrock-guardrailAction': 'NONE'}, logprobs=None, delta=None)

So, in res.raw are reported not only the number of tokens in input and output, but also all the decisions and detections made by the AWS guardrail.

Unfortunately when I created a RAG using RetrieverQueryEngine, I get with the method query using he same query:

Response(response=..., source_nodes=[...], metadata={...})

and the previous information res.raw is completely lost.

It would be nice to keep track of this to diagnose and debug better the reasons why the guardrail blocked or what masked, etc. Moreover, it is possible to track the cost of the model used in Bedrock.

Thanks in advance

Reason

It would be nice to keep track of this to diagnose and debug better the reasons why the guardrail blocked or what masked, etc. Moreover, it is possible to track the cost of the model used in Bedrock.

I have not tried any other provider yet, but I believe other providers returns something similar to that res.raw, so the feature requested could be extended to all the provider available in llama-index.

Value of Feature

The model evaluation will be more complete.

logan-markewich commented 1 month ago

its would be a huge refactor to bubble up this information, fyi

logan-markewich commented 1 month ago

My recommendation is just using an observability tool, or writing your own event handler to look at LLM events

logan-markewich commented 1 month ago

For example, you could access and record response.raw by extending this example slightly https://docs.llamaindex.ai/en/stable/examples/instrumentation/observe_api_calls/

logan-markewich commented 3 weeks ago

Going to close this for now, using the instrumentation is the preferred way to access this right now