pinecone-io / canopy

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
https://www.pinecone.io/
Apache License 2.0
975 stars 121 forks source link

[Bug] No debug context available when setting CE_DEBUG_INFO=true #297

Closed coreation closed 9 months ago

coreation commented 9 months ago

Is this a new bug?

Current Behavior

Hello,

I want to use the context and knowledge base results of a chat_engine.chat execution. To do this, I've put the following in my .env file which i load using load_dotenv()

CE_DEBUG_INFO="true"

To confirm that this is read:

print(os.getenv("CE_DEBUG_INFO")) # true
CE_DEBUG_INFO = os.getenv("CE_DEBUG_INFO", "FALSE").lower() == "true" 
print(CE_DEBUG_INFO) # True

However, when I want to access the context, I get an empty object even though the response content contains a message based on RAG retrieval pieces, meaning there was a non-empty response.

       Tokenizer.initialize()

        pinecone_index = os.environ['PINECONE_INDEX']
        pinecone_namespace = os.environ['PINECONE_NAMESPACE']

        kb = KnowledgeBase(index_name=pinecone_index)
        kb.connect()
        # results = kb.query([Query(text="What is the outlook of the EV market?")])
        # print(results)

        context_engine = ContextEngine(kb)

        llm = OpenAILLM()
        chat_engine = ChatEngine(context_engine=context_engine, llm=llm)

        response = chat_engine.chat(messages=messages, stream=False, namespace=pinecone_namespace)
        print(response.debug_info) # This is empty

Expected Behavior

I would expect to have access to the full context/kb results when setting the CE_DEBUG_INFO variable.

Steps To Reproduce

I think the notebook used in the "library" part of canopy has the basic steps, just add the CE_DEBUG_INFO variable and check for the debug context. I hope that will suffice :)

Relevant log output

No response

Environment

- **OS**: OS X
- **Language version**: Python 3.9.2
- **Canopy version**: 0.7.0

Additional Context

No response

igiloh-pinecone commented 9 months ago

@coreation I see you got unblocked in your own.
Is there a missing documentation that could have made this clearer somehow?

coreation commented 9 months ago

@igiloh-pinecone no, unfortunately I'm not :) I'm running the code described in the ticket, but there's no context coming along, even though the chat response contains a properly formed answer. I'm now running copies of the code so that I can debug the entire RAG flow and see where it might go wrong

izellevy commented 9 months ago

Hi @coreation, is it possible to try to set CANOPY_DEBUG_INFO=true? We had added some more debug info to specific classes and decided to change the CE_DEBUG_INFO (CE meaning ContextEngine) to CANOPY_DEBUG_INFO to better reflect it is a project-wide config.

coreation commented 9 months ago

hey @izellevy thanks, I'll give that a try, but it seems that I've got issues just getting a proper retrieval going. I'm running both the canopy REST API and the code mentioned in the ticket to compare things side by side. The environment variables are the same, but the custom code, based on what the library documentation mentions isn't able to generate anything given the same question.

Using the REST API

Q: Is ChatGPT commandeering the mundane tasks that young employees have relied on to advance their careers?
A: Yes, ChatGPT is commandeering the mundane tasks that young employees have relied on to advance their careers. The generative-AI boom has led many companies to automate tasks such as spreadsheet building and generic copywriting in the name of becoming more efficient. These tasks are typically handled by entry-level workers, who were given them as a way to "earn their stripes" and develop in the workplace. However, with the rise of generative AI technology like ChatGPT, organizations are starting to automate these junior tasks, undermining the traditional path of advancement for young employees. This has raised concerns among members of Gen Z, with surveys indicating that 76% of them are worried about losing their jobs to ChatGPT.

Using the code mentioned in the ticket, based on the library.md file

Q: Is ChatGPT commandeering the mundane tasks that young employees have relied on to advance their careers?
A: There is no information in the provided context that directly addresses the impact of ChatGPT on young employees and their reliance on mundane tasks for career advancement. Therefore, I don't have enough information to answer your question.

I'm trying to wrap my head around what I'm doing wrong here...

coreation commented 9 months ago

@izellevy @igiloh-pinecone the debug flag works... but the larger issue is that using the following code, does not deliver any kind of response, whereas the canopy REST API does, given the exact same configuration.

If I look at the debug info, the documents that the KB retrieves are all...trash...just not relevant, while it's clear that by using the REST API endpoint on the same index does return information as it contains the sources that are in my index. Meaning, not something OpenAI can come up with.

       Tokenizer.initialize()

        pinecone_index = os.environ['PINECONE_INDEX']
        pinecone_namespace = os.environ['PINECONE_NAMESPACE']

        kb = KnowledgeBase(index_name=pinecone_index)
        kb.connect()
        # results = kb.query([Query(text="What is the outlook of the EV market?")])
        # print(results)

        context_engine = ContextEngine(kb)

        llm = OpenAILLM()
        chat_engine = ChatEngine(context_engine=context_engine, llm=llm)

        response = chat_engine.chat(messages=messages, stream=False, namespace=pinecone_namespace)
        print(response.debug_info) # This is empty

Is there anything I should watch out for here? My goal (not unimportant :) ) is to capture all the used sources so that I can fetch more meta-data of those sources to use in the UI that our end-users see.

coreation commented 9 months ago

@izellevy @igiloh-pinecone I'm going to make a dedicated issue out of the last comment as the original issue has been solved.