run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
34.74k stars 4.9k forks source link

[Question]: Access LLM's response object CompleteResponse() attribute `additional_kwarg` in RAG #14777

Closed AshishSardana closed 3 weeks ago

AshishSardana commented 1 month ago

Question Validation

Question

I wish to access the additional_kwargs attribute of CompletionResponse object returned by the complete() method of the LLM integration when this LLM is used in a simple RAG pipeline.

In a RAG pipeline, the response is of type llama_index.core.base.response.schema.Response which only stores the text attribute of CompletionResponse object.

Here's an example RAG pipeline that I'm using:

Settings.llm = llm
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("A random question")

I aim to create a notebook showcasing use of trustworthiness_score in RAG, therefore I would prefer to update Cleanlab's implementation/utilities so that the end-user doesn't have to write code.

This is a similar use-case to that of accessing log_prob or token_count from OpenAI API, in RAG setup.

dosubot[bot] commented 1 month ago

Hey @AshishSardana! I'm here to help you with any bugs, questions, or contributions you may have. Let's tackle this together while we wait for a human maintainer.

To access the additional_kwargs attribute of the CompletionResponse object in a RAG setup, you can modify the complete and stream_complete methods in the AnthropicMultiModal class. These methods already include the additional_kwargs attribute in the CompletionResponse object. Here is how you can ensure that the additional_kwargs are properly accessed and utilized:

def complete(
        self,
        prompt: str,
        image_documents: Sequence[ImageDocument],
        **kwargs: Any,
    ) -> CompletionResponse:
        """Complete."""
        all_kwargs = self._get_model_kwargs(**kwargs)
        message_dict = self._get_multi_modal_chat_messages(
            prompt=prompt, role=MessageRole.USER, image_documents=image_documents
        )

        response = self._client.messages.create(
            messages=message_dict,
            system=self.system_prompt,
            stream=False,
            **all_kwargs,
        )

        return CompletionResponse(
            text=response.content[0].text,
            raw=response,
            additional_kwargs=self._get_response_token_counts(response),
        )

def stream_complete(
        self,
        prompt: str,
        image_documents: Sequence[ImageDocument],
        **kwargs: Any,
    ) -> CompletionResponseGen:
        """Stream complete."""
        all_kwargs = self._get_model_kwargs(**kwargs)
        message_dict = self._get_multi_modal_chat_messages(
            prompt=prompt, role=MessageRole.USER, image_documents=image_documents
        )

        def gen() -> CompletionResponseGen:
            text = ""

            for response in self._client.messages.create(
                messages=message_dict,
                stream=True,
                system=self.system_prompt,
                **all_kwargs,
            ):
                if isinstance(response, ContentBlockDeltaEvent):
                    # update using deltas
                    content_delta = response.delta.text or ""
                    text += content_delta
                    yield CompletionResponse(
                        delta=content_delta,
                        text=text,
                        raw=response,
                        additional_kwargs=self._get_response_token_counts(response),
                    )

        return gen()

In these methods, the additional_kwargs attribute is populated using the _get_response_token_counts function, which extracts additional key-value pairs from the response dictionary, excluding specified keys. This ensures that any additional information returned by the API is captured and made available in the CompletionResponse object [1].

Additionally, if you are using the Aleph Alpha model with LlamaIndex, you can access detailed response information such as log probabilities, model version, and raw completion text via the additional_kwargs attribute. Here is an example:

from llama_index.llms.alephalpha import AlephAlpha

llm = AlephAlpha(model="luminous-base-control", log_probs=0)

resp = llm.complete("Paul Graham is ")

if resp.logprobs is not None:
    print("\nLog Probabilities:")
    for lp_list in resp.logprobs:
        for lp in lp_list:
            print(f"Token: {lp.token}, LogProb: {lp.logprob}")

if "model_version" in resp.additional_kwargs:
    print("\nModel Version:")
    print(resp.additional_kwargs["model_version"])

if "raw_completion" in resp.additional_kwargs:
    print("\nRaw Completion:")
    print(resp.additional_kwargs["raw_completion"])

This example demonstrates how to access and print additional response details from the CompletionResponse object [2].

Furthermore, the llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py file provides additional context on how additional_kwargs can be managed and utilized within the LlamaIndex framework. For instance, the _get_model_kwargs method combines base keyword arguments with additional_kwargs, ensuring that any extra parameters are included in the API request:

def _get_model_kwargs(self, **kwargs: Any) -> Dict[str, Any]:
    base_kwargs = {"model": self.model, "temperature": self.temperature, **kwargs}
    if self.max_tokens is not None:
        base_kwargs["max_tokens"] = self.max_tokens
    if self.logprobs is not None and self.logprobs is True:
        if self.metadata.is_chat_model:
            base_kwargs["logprobs"] = self.logprobs
            base_kwargs["top_logprobs"] = self.top_logprobs
        else:
            base_kwargs["logprobs"] = self.top_logprobs  # int in this case
    return {**base_kwargs, **self.additional_kwargs}

This ensures that additional_kwargs are always included in the request payload, allowing you to access and utilize any additional information provided by the API [3].

To continue talking to Dosu, mention @dosu.

AshishSardana commented 1 month ago

These examples showcase using LLM object as it is, and not in a RAG setup. @ravi-theja , @masci - team, just tagging you in case you know of some approach to achieve this.