logancyang / obsidian-copilot

THE Copilot in Obsidian
https://www.obsidiancopilot.com/
GNU Affero General Public License v3.0
3.08k stars 217 forks source link

Embedding database not saved, "Invalid string length" #806

Closed upnix closed 5 hours ago

upnix commented 1 week ago

Copilot version: 2.7.1

(Bug report without the above will be closed)

Describe how to reproduce I can't say if this will reproduce the error since it costs me about $1 per attempt, I'm hesitant to just keep trying. I have 4222 notes that are indexed and the exclusions of *.jpg, *.png, *.pdf, _templates, Copilot, *_resources.

When indexed with nomic-embed-text through Ollama, things work fine. I switched to text-embedding-3-large through OpenAI and forced a reindex. That also seemed to work fine and I could use vault QA. But I noticed that my copilot-index-...json was only 4k. I checked the developer console and saw the message "Error saving Orama database to ... RangeError: Invalid string length": Screenshot_20241112_085307

Expected behavior .json index file to be saved so that may indexing survives an Obsidian restart.

What else?
I haven't tried to reproduce this yet. I can turn on "Debug mode" in Copilot for whatever that's work, but is there anything else I can do to try to get as much information as possible from this? Like I said above, it's about $1 per index.

Is there some manual intervention I can take to save this database when it clearly exists in memory and just isn't on disk?

logancyang commented 1 week ago

This shouldn't happen in theory, when you switch to a new embedding model it should clear the previous index and rebuild, there shouldn't be any existing doc error. Do you see this message in your console Detected change in embedding model. Rebuilding vector store from scratch?

Oh I see your debug mode is off. I need debug mode messages though. You can start indexing until you see the errors and pause it to avoid the high cost.

What's your db size when it's nomic-embed-text? It could be too large to save for Orama.

upnix commented 1 week ago

I can recreate this - I just did it again but using the OpenAI text-embedding-3-small model (so, ~$0.17). I'll have to try a few more times.

Nomic DB size (which saves to disk) was 419MB in size, and the error only shows up after indexing finishes, so I can't cut the indexing short.

Do you see this message in your console Detected change in embedding model. Rebuilding vector store from scratch?

I don't see this, but I'm forcing the reindex before Copilot has a chance to do anything else. I just did this the first time because it seemed like a sure way to get a clean start, then I did it again the second time to try to recreate the problem.

Here's the logs (without debug mode) at the start of the re-indexing: Screenshot_20241113_083215

Then after many "Error storing vectors in VectorDB", here's the last few console messages: Screenshot_20241113_083358

logancyang commented 1 week ago

@upnix That's interesting, I see "Local vector store cleared successfully", so it shouldn't have upsert issues. And your vault is smaller than mine which shouldn't be too big to cause the last error. I haven't been able to repro but I don't have your vault to test, so not sure what exactly is causing it.

If you are willing to dive in the code you are the best person to spot the exact problem, if not no worries. I merged a change and will release 2.7.2 soon. Could you test again when it's released? Just to confirm, do you only see this with openai embedding models? If you can repro with local embeddings that's better cuz you don't have to pay for the testing (only a few cents but still better if it's free).

logancyang commented 5 hours ago

This is related to #834. Please continue the conversation there, closing this one.