[Bug]: TypeError: 'NoneType' in get_llm_token_counts when using CondensePlusContextChatEngine and Bedrock

vldvasi commented 11 months ago

Bug Description

When using LLamaIndex's QueryFusionRetriever and CondensePlusContextChatEngine with the Bedrock APIs (Claude-v2 and Titan-embed-text-v1) a NoneType exception is triggered in the function get_llm_token_counts() of token_counting.py.

    usage = response.raw["usage"]  # type: ignore
            ~~~~~~~~~~~~^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

Possibly handling the case when response.raw is None would fix the issue.

Version

0.9.8

Steps to Reproduce

The following snippet of code results in the mentioned error:

llm = Bedrock(
    client=BEDROCK_CLIENT,
    model_id="anthropic.claude-v2",
    region_name="us-east-1",
    # verbose=True,
)

embed_model = LangchainEmbedding(
    BedrockEmbeddings(
        model_id="amazon.titan-embed-text-v1",
        region_name="us-east-1",
        client=BEDROCK_CLIENT,
    )
)

service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    callback_manager=callback_manager,
    chunk_size=llm_model_chunk_size,
    context_window=llm_model_context_window,
)

nest_asyncio.apply()

retriever = QueryFusionRetriever(
        [index.as_retriever() for index in indexes],
        llm=llm,
        similarity_top_k=int(similarity_top_k),
        num_queries=1, 
        use_async=True,
        verbose=True
    )   

custom_chat_history = get_chat_history(conversation_id, db)
chat_engine = CondensePlusContextChatEngine.from_defaults(
    retriever=retriever,
    service_context=service_context,
    context_prompt=qa_prompt_templ_text,
    condense_prompt=custom_prompt,
    chat_history=custom_chat_history,
    verbose=True
)

llm_response = chat_engine.chat(question)
answer = llm_response.response

Relevant Logs/Tracbacks

2023-12-03 11:58:02,595 - ERROR - Error in chat_query_engine_multi_index_ep: 'NoneType' object is not subscriptable
Traceback (most recent call last):
  File "C:\Users\a912553\InsightEngine\routes\chat_engine_routes.py", line 810, in chat_query_engine_multi_index_ep
    response = chat_query_engine_multi_index(db, query_input)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\a912553\InsightEngine\routes\chat_engine_routes.py", line 718, in chat_query_engine_multi_index
    llm_response = chat_engine.chat(question)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\a912553\InsightEngine\.venv\Lib\site-packages\llama_index\callbacks\utils.py", line 39, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\a912553\InsightEngine\.venv\Lib\site-packages\llama_index\chat_engine\condense_plus_context.py", line 289, in chat
    chat_response = self._llm.chat(chat_messages)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\a912553\InsightEngine\.venv\Lib\site-packages\llama_index\llms\base.py", line 208, in wrapped_llm_chat
    callback_manager.on_event_end(
  File "C:\Users\a912553\InsightEngine\.venv\Lib\site-packages\llama_index\callbacks\base.py", line 116, in on_event_end
    handler.on_event_end(event_type, payload, event_id=event_id, **kwargs)
  File "C:\Users\a912553\InsightEngine\.venv\Lib\site-packages\llama_index\callbacks\token_counting.py", line 154, in on_event_end
    get_llm_token_counts(
  File "C:\Users\a912553\InsightEngine\.venv\Lib\site-packages\llama_index\callbacks\token_counting.py", line 49, in get_llm_token_counts
    usage = response.raw["usage"]  # type: ignore
            ~~~~~~~~~~~~^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable

dosubot[bot] commented 11 months ago

🤖

It's great that you've identified the issue and provided a clear description of the bug. Your proposed solution to handle the case when response.raw is None seems like a step in the right direction. Encouraging you to go ahead and implement this fix in a pull request for LlamaIndex. Your contribution will be valuable in improving the framework for all users. Thank you for your willingness to contribute!

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

vldvasi commented 11 months ago

A possible fix that I've tested and working would be to handle the case when response.raw is None in get_llm_token_counts function:


        # try getting attached token counts first
        try:
            messages_tokens = 0
            response_tokens = 0

            if response.raw is not None:
                usage = response.raw.get("usage")  # type: ignore

                if usage is not None:
                    messages_tokens = usage.prompt_tokens
                    response_tokens = usage.completion_tokens

                if messages_tokens == 0 or response_tokens == 0:
                    raise ValueError("Invalid token counts!")

                return TokenCountingEvent(
                    event_id=event_id,
                    prompt=messages_str,
                    prompt_token_count=messages_tokens,
                    completion=response_str,
                    completion_token_count=response_tokens,
                )

        except (ValueError, KeyError):
            # Invalid token counts, or no token counts attached
            pass

logan-markewich commented 11 months ago

Merged into main. Will be released v0.9.12 (or you can install from source and get it now)

run-llama / llama_index