weaviate / Verba

Retrieval Augmented Generation (RAG) chatbot powered by Weaviate
BSD 3-Clause "New" or "Revised" License
5.06k stars 535 forks source link

Embedder TokenChunker not found. #238

Closed provVladBurlik closed 1 week ago

provVladBurlik commented 3 weeks ago

Description

When following the instructions for azure open AI server starts, but when trying to add document - server show the message in the subject.

Is this a bug or a feature?

Steps to Reproduce

Additional context

Azure open AI ubuntu 20.04 virtenv

silicongarage commented 3 weeks ago

I'm getting the same error, but with different OS/setup.
Docker container (built from verba git clone) on Macbook Apple Silicon with Ollama server running on host.

When I click on "Add Documents" the web client errors with the following message: Application error: a client-side exception has occurred (see the browser console for more information).

Checking the Docker server (verba) logs shows: Embedder TokenChunker not found

.ENV

OLLAMA_URL=http://host.docker.internal:11434
OLLAMA_MODEL=llama3

SERVICES weaviate: 1.24.2 verba: latest git pull (2024/07/03)

wythedee commented 3 weeks ago

Same problem. Used pip install to setup verba, went straightly to the web page and clicked on RAG and the same error occurs:Application error: a client-side exception has occurred (see the browser console for more information). Wonder what did I miss. Also I wanted to know if it is possible to set the ollama_url to a outer url but not local. That endpoint should owns the same service as ollama does. plz help!

thomashacker commented 3 weeks ago

Thanks for the issue! This seems definitely like a bug. What console message are you receiving from the frontend when client-side exception occurs? Do any other errors occur?

Make sure that you install Verba in a clean environment to prevent dependency issues.

syrian2012 commented 3 weeks ago

I have the same error tested on all available installation methods and get the same error Screenshot 2024-07-04 154617

provVladBurlik commented 3 weeks ago

I've added additional traces: image

FlipinFlop commented 3 weeks ago

Same error while running v1.0.3 on Ubuntu, installed via pip in a clean environment. Installed v1.0.1 instead and it's now running fine.

provVladBurlik commented 3 weeks ago

Thanks for the issue! This seems definitely like a bug. What console message are you receiving from the frontend when client-side exception occurs? Do any other errors occur?

Make sure that you install Verba in a clean environment to prevent dependency issues.

Installed in new virt env - as per instructions

silicongarage commented 3 weeks ago

Looks like a change to components/managers.py was made two days ago.
I changed it back to the original and now the web page doesn't crash.

Screenshot 2024-07-04 at 1 55 43 PM

~Ha, but now have a new problem with "No Chunks Available" when querying at the Chat Interface. It does appear that a lot of POST calls are being made to the Ollama server. From the Admin Console it appears that there are chunks in the database.~

image

[UPDATE]

Everything appears to work now.

provVladBurlik commented 3 weeks ago

I am using azure openai - does this work too?

silicongarage commented 3 weeks ago

The problem appears to be the change in components/managers.py so that's probably all you need. I had additional problems (masked by this) in my Ollama local LLM setup. Hopefully one of the developers will chime in soon.

thomashacker commented 2 weeks ago

Great catch! The self.selected_embedder variable had the wrong component name. Thanks a lot for debugging! Will be fixed and pushed asap

thomashacker commented 2 weeks ago

Fixed and pushed, let me know if the error still persists

shanumas commented 2 weeks ago

@thomashacker Still the same. But this one works:

image

mamscience commented 1 week ago

@thomashacker the root cause of the error is a faulty configuration file which is stored in Weaviate Embedded.

In util.py, the manager sets the embedder (line 127), which it retrieves from the config file stored in the Weaviate database. In this config file, TokenChunker is accidentally set to default 'selected', which util.py retrieves and manager doesn't like because TokenChunker is not a Embedder.

I hard coded the fix by adding CohereEmbedder here instead util.py line 126/127

#manager.embedder_manager.set_embedder(config.get("Embedder", {}).get("selected", "")) #bug here
manager.embedder_manager.set_embedder("CohereEmbedder")

My suggestion would be is to add a json file with the standard config from which the application loads in the database. Took some time to retrieve the config file (which also was not valid JSON btw)

Thanks for your hard work.

thomashacker commented 1 week ago

I agree, there's a lot of room for improvement for the configuration settings. When an invalid config is saved, the errors will most likely still persist. As workaround, you can reset the configuration file over the frontend.

I'll add config validation in the coming update to avoid these issue in the future. Thanks a lot for the feedback 🚀

shanumas commented 1 week ago

NO, all works well now.