[Bug]: RAGAS integration with llamindex issue

AnujLahoty commented 3 months ago

Bug Description

So I want to get the metrics from RAGAS and whenever I dowwnload model locally in my EC2 the metrics gives me the answer biut when I use sagemaker end point as a LLM it thros error. The error says that it is asyncio error and I need to do nest.apply(). I laready did it but the actual error message is cannot reuse already awaited coroutine

Version

0.10.55

Steps to Reproduce

Here i the code that can be run to priduce the error message

import nest_asyncio
llm = SageMakerLLM(
    endpoint_name=<>,
    region_name=<>
)
embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-mpnet-base-v2", device="cuda")

Settings.llm = llm
Settings.embed_model = embed_model

nest_asyncio.apply()

storage_context = StorageContext.from_defaults(persist_dir=<>)
index = load_index_from_storage(storage_context)

# define evaluator
evaluator = FaithfulnessEvaluator(llm=llm)

# query index
query_engine = index.as_query_engine()
query = "What battles took place in New York City in the American Revolution?"
response =  query_engine.query("What battles took place in New York City in the American Revolution?")
eval_result = evaluator.evaluate_response(query=query, response=response)
print(str(eval_result.passing))

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 3 months ago

To resolve the "cannot reuse already awaited coroutine" error when integrating RAGAS with llama_index using a SageMaker endpoint as a language model, you need to ensure that each coroutine is awaited only once. This error typically occurs when a coroutine is awaited multiple times, which is not allowed in Python.

In your provided code, the asynchronous methods achat, astream_chat, acomplete, and astream_complete are not implemented and raise NotImplementedError. You need to implement these methods properly.

Here is an example of how you might implement the acomplete method to avoid the "cannot reuse already awaited coroutine" error:

@llm_completion_callback()
async def acomplete(
    self, prompt: str, formatted: bool = False, **kwargs: Any
) -> CompletionResponse:
    model_kwargs = {**self.model_kwargs, **kwargs}
    if not formatted:
        prompt = self._completion_to_prompt(prompt, self.system_prompt)

    request_body = self.content_handler.serialize_input(prompt, model_kwargs)
    response = await self._client.invoke_endpoint_async(
        EndpointName=self.endpoint_name,
        Body=request_body,
        ContentType=self.content_handler.content_type,
        Accept=self.content_handler.accept,
        **self.endpoint_kwargs,
    )

    response["Body"] = self.content_handler.deserialize_output(response["Body"])
    text = self.content_handler.remove_prefix(response["Body"], prompt)

    return CompletionResponse(
        text=text,
        raw=response,
        additional_kwargs={
            "model_kwargs": model_kwargs,
            "endpoint_kwargs": self.endpoint_kwargs,
        },
    )

In this implementation, invoke_endpoint_async is a hypothetical asynchronous method that you would need to implement or use if available in your AWS client library. This ensures that the coroutine is awaited properly and only once.

Make sure to implement similar changes for other asynchronous methods if you plan to use them.

Additionally, you can refer to the SageMakerLLM class in the llama_index library for more context on how to set up and use the SageMaker endpoint. Here is an example setup:

from llama_index.llms.sagemaker import SageMakerLLM

# hooks for HuggingFaceH4/zephyr-7b-beta
def messages_to_prompt(messages):
    prompt = ""
    for message in messages:
        if message.role == 'system':
            prompt += f"<|system|>\n{message.content}</s>\n"
        elif message.role == 'user':
            prompt += f"<|user|>\n{message.content}</s>\n"
        elif message.role == 'assistant':
            prompt += f"<|assistant|>\n{message.content}</s>\n"

    if not prompt.startswith("<|system|>\n"):
        prompt = "<|system|>\n</s>\n" + prompt

    prompt = prompt + "<|assistant|>\n"
    return prompt

def completion_to_prompt(completion):
    return f"<|system|>\n</s>\n<|user|>\n{completion}</s>\n<|assistant|>\n"

model_name = "HuggingFaceH4/zephyr-7b-beta"
api_key = "your_api_key"
region = "your_region"

llm = SageMakerLLM(
    model_name=model_name,
    api_key=api_key,
    region=region,
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
)

To further ensure that coroutines are managed correctly, you can use utility functions from the llama_index library, such as run_async_tasks and batch_gather, which help in running multiple asynchronous tasks efficiently:

from llama_index.core.async_utils import run_async_tasks, batch_gather

# Example usage of run_async_tasks
tasks = [some_async_function() for _ in range(10)]
results = run_async_tasks(tasks, show_progress=True)

# Example usage of batch_gather
batch_results = await batch_gather(tasks, batch_size=5, verbose=True)

These utilities help manage the execution of multiple coroutines, ensuring they are awaited properly and avoiding the "cannot reuse already awaited coroutine" error ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

AnujLahoty commented 3 months ago

@logan-markewich Can you please check it mate?

logan-markewich commented 3 months ago

Is the faithfulness evaluator from ragas? From llamaindex? You left out the imports, as well as the actual stack trace after applying nest asyncio

AnujLahoty commented 3 months ago

@logan-markewich My apologies. I will ping them here ASAP. But on a very hgh level I am loading my LLM from Sagemaker endpoint using ;;lamindex's Sagemaker end point module. Then I am running ragas evaluation metrics. It says that acomplete function is not implemented for it.

logan-markewich commented 3 months ago

Yea the async endpoints for sagemaker LLM are not implemented. Someone would have to contribute that in a PR

run-llama / llama_index