pinecone-io / canopy

Retrieval Augmented Generation (RAG) framework and context engine powered by Pinecone
https://www.pinecone.io/
Apache License 2.0
947 stars 115 forks source link

[BUG] Sudden deterioration in chat bot answers after installing Canopy 0.7.0 #299

Closed coreation closed 6 months ago

coreation commented 6 months ago

Hi there!

First of all, as we're dealing with open source, big kudos to the maintainers and to @igiloh-pinecone and @izellevy who have been responsive in the past on some questions!

My use case is that I would really need to capture the different sources / vectors that were used making the final response, which is currently only possible by making the ChatEngine call in the code instead of using the built-in REST API of canopy-server.

The issue I'm facing is that based on the same Pinecone index and namespace, I'm getting very different responses using the built-in REST API and the proposed code from the library.md docs.

Configuration

For simplicity sake, I'll stick to 1 example:

After the accumulator is loaded, the power supply is interrupted and a switch made of likewise superconducting material is actuated. This switch is responsible for disconnecting the coil from the inverter. The circuit is then reconnected to the inverter to discharge the stored energy. In this way, alternating current is generated from the direct current.

The efficiency of this type of energy storage system for generating direct current is around 97 percent. However, considerable cooling requirements need to be taken into account, which often stand in the way of the technology’s economic industrial use.


* The code of the "advanced" way of using canopy, following the [library docs](https://github.com/pinecone-io/canopy/blob/main/docs/library.md)
```python
       Tokenizer.initialize()

        pinecone_index = os.environ['PINECONE_INDEX'] 
        pinecone_namespace = os.environ['PINECONE_NAMESPACE']

        kb = KnowledgeBase(index_name=pinecone_index)
        kb.connect()

        context_engine = ContextEngine(kb)

        llm = OpenAILLM()
        chat_engine = ChatEngine(context_engine=context_engine, llm=llm)

        response = chat_engine.chat(messages=messages, stream=False, namespace="pinecone_namespace")
        print("-----------------")
        print(response.debug_info) # Used to make the findings below in the issue post

Scenario output

Scenario 1: Using the built-in Canopy REST API endpoint

Q: what can you tell me about superconducting magnetic energy storage?

A: Superconducting magnetic energy storage (SMES) is a type of electrical energy storage technology. It involves the use of a coil made of superconducting material, such as compounds of iron, phosphorus, lanthanum, or oxygen. A direct current from a rectifier flows through this coil, creating a magnetic field in which the energy is stored. The stored energy can be discharged by reconnecting the circuit to an inverter, which converts the direct current back into alternating current. The efficiency of SMES in generating direct current is around 97 percent. However, the technology's economic industrial use is limited due to significant cooling requirements. SMES has potential applications in the field of electricity transmission and can significantly reduce the materials used for energy transmission compared to conventional power cables. Superconducting transmission technology is expected to be commercially available by around 2030. It is also worth mentioning that SMES is a form of energy storage that has to compete with other alternatives, including chemical batteries and physical storage systems. (Source: https://energy-nest.com/energy-storage-technologies, https://worth.com/new-electric-grid-batteries-turn-heat-into-light/, https://www.currenteurope.eu/efficient-electricity-grids-are-key-for-a-net-zero-industry-and-managing-demand-for-critical-raw-materials) 

Scenario 2: Using the python code mentioned above - varies from either no response or a response made up solely by the LLM

Q: what can you tell me about superconducting magnetic energy storage?

A: Superconducting Magnetic Energy Storage (SMES) is a technology that uses superconducting coils to store electrical energy in the form of a magnetic field. This technology relies on the principle of superconductivity, where certain materials exhibit zero electrical resistance at low temperatures.

SMES systems consist of a superconducting coil that is cryogenically cooled using liquid helium to maintain its superconducting state. When electrical energy is supplied to the system, the current flows through the superconducting coil, creating a strong magnetic field. This magnetic field stores the energy.

SMES systems can deliver stored energy back to the grid when needed by reversing the process. The magnetic field collapses, generating a current that can be converted back to electricity and supplied to the grid.

SMES offers several advantages as an energy storage technology, including high power density, fast response times, and a high energy storage capacity. It can be used to address power quality issues such as voltage and frequency fluctuations and can provide backup power during grid outages.

Source: Adapted from multiple sources, no specific source provided in the context.

Scenario 3: Using the python code mentioned above - but with only the 1 relevant document in the pinecone index

Q: what can you tell me about superconducting magnetic energy storage?

A: Superconducting magnetic energy storage (SMES) is an electrical energy storage technology. In SMES, a direct current from a rectifier flows through a coil made of superconducting material such as compounds of iron, phosphorus, lanthanum, or oxygen. This creates a magnetic field in which the energy is stored. After the energy is loaded into the accumulator, a switch made of superconducting material is actuated to disconnect the coil from the inverter. The circuit is then reconnected to the inverter to discharge the stored energy, converting it into alternating current. The efficiency of SMES for generating direct current is around 97 percent. However, it is important to note that SMES requires significant cooling, which often hinders its economic industrial use.
(Source: https://energy-nest.com/energy-storage-technologies)

Findings

Difference in retrieval by the built-in canopy REST API then in the example

By logging the debug info, it looks like scenario 2 fetches all kinds of unrelated vectors, having nothing to do with the initial question, the scores for those documents are around 0.06. Checking out the scores for Scenario 3, it did find the content - as it was the only document in the pinecone index for that scenario - and the score of the document wass -0.03 roughly.

So it seems it gets the retrieval part "wrong", but what is odd is that the built-in canopy server, which afaik uses default values just like in the code example, does get the retrieval part right. So my assumption is very likely wrong, and I'm missing some sort of addition initialisation or configuration.

This is where I'm kind of hard stuck and would greatly appreciate any pointers and I'll be more than happy to make a PR making the library.md more complete!

igiloh-pinecone commented 6 months ago

@coreation from the top of my mind, two immediate thoughts:

  1. Chunk size of 4000 is actually quite large. Chunk size is usually set in the area of 256-512 tokens. It would be very hard for an embedding model to represent such a long text into a single semantic representation. This might explain the poor retrieval results.
  2. You mentioned changing some default parameters (e.g. chunk_size). How did you change these parameters? By setting them in a config file? If so - when you tested manually with Python code, have you used the same config file and\or the same parameter values? Otherwise it had probably used the built-in defaults.
coreation commented 6 months ago

@igiloh-pinecone thanks for the quick thoughts,

  1. I found it a bit large as well, but I found the default in this file. The chunk_size there is set to 4000... Am I reading this wrong?

  2. I'm not using the embedding via canopy directly as I need to embed some things coming from a database, so I used the splitter, and thus the "4000" chunk size and then did the embedding and storing myself in Pinecone. I'll try to use 1024 as I've read a couple of articles, amongst which this one that point out there's no real "good size", but 1024 seems to be a good sweet spot. I'll re-embed everything and see if things get better. If we could get some feedback on the 4000 chunk_size in the langchain_text_splitter, then I'm happy to make a small PR changing it to 1024 or 512.

coreation commented 6 months ago

@igiloh-pinecone , it looks like the "chunk_size" in the splitter is actually referring to the amount of characters... I'm using the OpenAI tokenizer online to see how many tokens my pieces of text have and they're all around 700 tokens instead of 4000. So I've misinterpreted "chunk size" for "token size".

So the code to split up my text is as the default is described in the LangChain text splitter:

RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
coreation commented 6 months ago

@igiloh-pinecone I've received a Github comment update but I don't see your update here on the thread. But just to answer the question you wrote, I'm using indeed the LangChain Recursive splitter, which I believe does take the parameter in terms of "characters", not tokens. If I take random samples of my vectors, the text is around 700 tokens.

Is your suggestion to lower that amount and use the Canopy Chunker?

igiloh-pinecone commented 6 months ago

I noticed your previous message where you stated that you use langchain directly, so I deleted mine as it was irrelevant.

My main point wasn't actually about the chunk size itself (I guess ~700 tokens is workable) - but rather a question of how do you configure your canopy server versus how did you configure the direct python ChatEngine. Are you sure you've used the same config \ params?

On more suggestion - can you please try repeating the same question 2-3 times in each scenario (server API vs direct python class)? Could it be that the underlying LLM is simply a bit "noisy", answering the same question differently every time?

coreation commented 6 months ago

hey @igiloh-pinecone what you see in the code example is the only configuration I use, it's the same as the variables I export before I star the canopy server. So that's simply the pinecone API key, OpenAI API key and index/namespace.

I've tried repeating the questions, but it seems like it's the same thing. See images on the bottom of the comment, one contains sources, the other does not.

Do you guys offer any paid support by chance? I'm sort of knowledgeable on a high-level about a couple of RAG frameworks, this is the first that does away with a lot of fluff because it's more tailored towards a use case. I can try again using LangChain, but the state of LangChain for so meone who isn't day-in day-out on top it, is just too much to keep up with.

Responses using canopy server using-canopy-server

Responses using the code example from library.md

with-custom-code

coreation commented 6 months ago

@igiloh-pinecone perhaps not unimportant, the "text" property in Pinecone is often displayed as type "[]" instead of text, is this because of the new lines from the chunking?

coreation commented 6 months ago

@igiloh-pinecone ... I found the issue after letting the code ponder in the back of my head :) The default encoder has changed to the latest openai embedding (small) model, while my embeddings were still on ada embeddings. I see that in a previous canopy install, in all likelihood, the one running canopy server, the default encoder is still pointing to ada-002... So that explains the trash results my knowledge base gave me, while at the same time the built-in canopy server is returning decent RAG based results.

I'll see if I find the time to make a documentation PR so that the encoder is explicitly passed in the advanced example in the library.md.

igiloh-pinecone commented 6 months ago

The default encoder has changed to the latest openai embedding (small) model, while my embeddings were still on ada embeddings.

Thanks for the detailed response @coreation !!
That's definitely an oversight by us. We shouldn't have changed the default like that without at least highlighting it as a breaking change. I will change the issue's name to make it more discoverable by other users encountering the same problem.

igiloh-pinecone commented 6 months ago

Gist for other people encountering this problem:
Before version 0.7.0, Canopy's default RecordEncoder was OpenAI(model_name='text-embedding-ada-002'). In version 0.7.0, the default was changed to use OpenAI's new embedding model (text-embedding-3-small).

If you have inserted your documents in the past using an older Canopy version, then upgraded Canopy and tried using the query() or chat() functions - your newly loaded instance would be using a different embedding model than the one used for inserting documents.

To fix this problem:

  1. run canopy create-config <path> command to generate Canopy's default config templates in your desired <path>
  2. Edit the default.yaml file, changing the embedding model_name to text-embedding-ada-002.
  3. Run canopy with the new config: canopy start --config <path>/default.yaml
coreation commented 6 months ago

Thanks for the swift response @igiloh-pinecone!