Too slow within obsidian

your-papa / obsidian-Smart2Brain

An Obsidian plugin to interact with your privacy focused AI-Assistant making your second brain even smarter!

GNU Affero General Public License v3.0

512 stars 31 forks source link

Too slow within obsidian #109

Open SmokeShine opened 4 months ago

SmokeShine commented 4 months ago

What happened?

I tried using llama 3 and phi-3. Performance is good for both of these models in Jan UI and ollama. However, when using within obsidian, it takes 3-4 minutes to retrieve with 0% creativity and 20% similarity.

Error Statement

No response

Steps to Reproduce

Change provider to llama-3/phi-3
open chat window
Compare inference time with Jan UI/ollama

Smart Second Brain Version

1.0.2

Debug Info

SYSTEM INFO: Obsidian version: v1.5.12 Installer version: v1.4.16 Operating system: Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:34 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T8103 23.0.0 Login status: not logged in Insider build toggle: off Live preview: on Base theme: dark Community theme: Atom v0.0.0 Snippets enabled: 1 Restricted mode: off Plugins installed: 4 Plugins enabled: 3 1: Supercharged Links v0.12.1 2: Style Settings v1.0.8 3: Smart Second Brain v1.0.2

nicobrauchtgit commented 4 months ago

Sorry the inference is taking so long. I could not figure this out, does Jan.ai use ollama to run the models? Or are you talking about two different cases?

SmokeShine commented 4 months ago

Sorry the inference is taking so long. I could not figure this out, does Jan.ai use ollama to run the models? Or are you talking about two different cases?

No. Jan can pull llama 3 on its own. I used it to double check if there was an issue in my hardware.

I am using ollama for the plugin.

Leo310 commented 4 months ago

The lower the similarity score the more notes are retrieved. If these notes are bigger than the LLM max context size we summarize them hierarchically to make them fit which can take some time as described here.

SmokeShine commented 4 months ago

But this will always bottleneck when you have the most requirement for using the assistant. If I have 10 notes, I can manage them without the plugin. If I increase the context length of the base model, I cannot fit it in VRAM.

zeigerpuppy commented 2 months ago

@SmokeShine, that's not quite what @Leo310 meant... You can index thousands of notes, but when you actually run a query (Smart Second Brain chat with RAG enabled), it has to find appropriate notes and fit them in the context window of the model... Or else it needs to summarize them (which takes time). So, you'll get a faster response, by increasing the "similarity" slider, so that fewer notes are retrieved for the RAG query.