your-papa / obsidian-Smart2Brain

An Obsidian plugin to interact with your privacy focused AI-Assistant making your second brain even smarter!
GNU Affero General Public License v3.0
645 stars 43 forks source link

Why is indexing so time-consuming? Why is it always frequently required to index? #85

Open wwjCMP opened 7 months ago

wwjCMP commented 7 months ago

Snipaste_2024-04-10_03-55-31

wwjCMP commented 7 months ago

Snipaste_2024-04-10_04-03-30

dougkeiller commented 7 months ago

+1 this question

Screenshot from 2024-04-09 13-50-37

wwjCMP commented 7 months ago

+1 this question

Screenshot from 2024-04-09 13-50-37

I have already completed the steps in this image, but still need an index.

wwjCMP commented 7 months ago

Switching models or modifying settings, the new index cannot be completed.

Leo310 commented 7 months ago

Do you use local models? If yes, it can be time-consuming if you don't have much computing power. So it would be interesting to know which specs your computer has.

Regarding your second question, it only indexes your whole vault once at plugin startup. After the initial indexing, subsequent indexing cycles should be way faster as notes that are indexed once won't be indexed again except if their content has changed.

wwjCMP commented 7 months ago

Do you use local models? If yes, it can be time-consuming if you don't have much computing power. So it would be interesting to know which specs your computer has.

Regarding your second question, it only indexes your whole vault once at plugin startup. After the initial index, subsequent indexing cycles should be way faster as notes that are indexed once won't be indexed again except if their content has changed.

Regarding the first point, yes, I am using a local model. The initial process is indeed time-consuming, but that is not my concern here. The issue I encountered is that after the initial indexing, it index the entire database again when obsidian reboot, and even after changing settings, it falls into a loop of continuous index.

wwjCMP commented 7 months ago

I am also using the obsidian-copilot plugin, I wonder if this is the cause of this issue?

Leo310 commented 7 months ago

We need to reindex again after reloading obsidian to ensure that the content of your notes is in sync with the vectorstore embeddings. But as I said reindexing should happen really fast. Did the initial indexing or reindexing took 113min?

And after changing the embedding model or provider settings, we must index the vault again once because different embedding models generate different vectors.

Leo310 commented 7 months ago

I am also using the obsidian-copilot plugin, I wonder if this is the cause of the confusion?

This shouldn't be an issue.

wwjCMP commented 7 months ago

What do you mean by "initial search"?

And after changing the embedding model or provider settings, we must index the vault again once because different embedding models generate different vectors.

initial indexing

Leo310 commented 7 months ago

Not sure if you saw it, but I also updated my message.

We need to reindex again after reloading obsidian to ensure that the content of your notes is in sync with the vectorstore embeddings. But as I said reindexing should happen really fast. Did the initial indexing or reindexing took 113min?

And after changing the embedding model or provider settings, we must index the vault again once because different embedding models generate different vectors.

RedemptionC commented 7 months ago

hum, it takes several hours on my machine image

BTW, I'm using M1Pro 16GB

dougkeiller commented 7 months ago

Why don't you offer alternative: embedding providers? chat providers?

Embedding: OpenAI is not the best. Lots of other options:

image

Chat, why not give access to Openrouter, so we can choose chat from there? It uses OpenAI API so should be VERY easy to add....

https://openrouter.ai/models

dougkeiller commented 7 months ago

Do you use local models? If yes, it can be time-consuming if you don't have much computing power. So it would be interesting to know which specs your computer has.

Regarding your second question, it only indexes your whole vault once at plugin startup. After the initial indexing, subsequent indexing cycles should be way faster as notes that are indexed once won't be indexed again except if their content has changed.

I'm not using a local model...they are too slow. 113 minutes (first index) with OpenAI Small.

Leo310 commented 7 months ago

We are working on the Openrouter support https://github.com/your-papa/obsidian-Smart2Brain/issues/78, but unfortunately we are really busy with university right now so it may take a few weeks.

I have a 3080 and for me indexing my 350 notes takes 5-10min.

cubeDHS2017 commented 6 months ago

I have over 4,000 files in my vault. It says it is going to take roughly 80 hours to finish indexing my vault. How would I decrease that time? image

richardstevenhack commented 5 months ago

I have 3,600 files in my vault, so that will be too long for me to use this. I think you need to reconsider indexing the full vault on startup. Other Obsidian plugins such as MYST can index the entire vault but also allow for just indexing files and folders separately. That would be a better approach, so people can incrementally index their entire vault.

a198h commented 5 months ago

@Leo310 It's an incredible plugin ! I'll wait as long as necessary 😉 Screen_20240617_200215

a198h commented 5 months ago

Does Smart Second Brain index only md files or all files ?

SyndicatedPillbug commented 4 months ago

The reindexing upon open is definitely not ideal, it'd be nice if there were a way around that (aside from never closing Obisidian, lol). But I'm REALLY excited about this plugin. I have a 2nd generation Zephyrus G14 with a 30-series GPU, and it indexed my whole, massive, database in about two hours.

However, reindexing is taking between 20-minutes and an hour (it's bouncing between those two ETAs) (the day after I initially installed and ran this plugin), which seems way off to me.

SyndicatedPillbug commented 4 months ago

Does Smart Second Brain index only md files or all files ?

This would be good to know - and nice to adjust in the settings if it does, because I have a LOT of non-.md files.

Leo310 commented 4 months ago

Does Smart Second Brain index only md files or all files ?

For now only md files.

On reducing the indexing time

Thanks to the new Ollama concurrency feature we can significantly reduce indexing time. We will implement this improvement as soon as we resume work on the S2B.

Further, we will investigate why reindexing takes so long, as it should not re-embed anything but simply compare hashes to ensure the embeddings are synchronized with the vault's content.

fdominguezr commented 3 months ago

It seems it will take me my whole life to index :-D My PC is not weak: i7 @2,6GHz, 32 GB RAM... And my vault is not huge, probably 250 notes of one page size average...

Any suggestion? Thanks! image

Savemech commented 1 week ago

Gray screen appears at the completion of indexing instead of normal Obsidian interface, any ideas?