nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.72k stars 7.71k forks source link

LocalDoc indexing takes forever and won't store status #1757

Open taylorren opened 11 months ago

taylorren commented 11 months ago

System Info

Windows 11, with latest patch 3060 GeForce with 8GB VRAM

Information

Reproduction

  1. Load a localdoc by adding a folder which contains a few sub-folders and many documents
  2. Indexing starts (as it says in the dialog)
  3. After a while, it may give a "small" progress bar.
  4. But after quitting the gpt4all and then restarts it, the progress is all lost.

Expected behavior

I would expect the indexing progress on the local documents are preserved, so I can eventually finish my indexing.

ConradVe commented 11 months ago

Same for me: it takes long to index and the green index bar is always small. Ubuntu 22.04 Memory 32 GB GCard NVIDIA GeForce GTX 1650 Folder data: 28 MB 8 files PDF

hmingo commented 11 months ago

I have the same problem: Will not index, not even after 8 hours of letting it try.

AMD Ryzen 7 3700U 2.30 GHz 20 GB RAM Radeon Vega Mobile Gfx Windows 11 Home 22631.2861 64-bit operating system

taylorren commented 11 months ago

@hmingo @ConradVe

Let me elaborate a bit more: The index is actually "doing", as I have tried on a very small document set: ~300 docs and 120M in size, pure MarkDown files. And it worked.

The concern I have now is I have a much bigger set to index: 10K+ docs and 5Gb in size, with multiple formats. The index process, I believe, is working but:

  1. The speed is slow
  2. No saving on the index status so it seems it starts from 0 all the time.
ConradVe commented 11 months ago

May be I understood the problem and the solution worked for me. Ubuntu 22.04 Memory 32 GB GCard NVIDIA GeForce GTX 1650 Folder data: 28 MB 8 files PDF in a folder in Desktop

1) I had problems to choose the folder for local Docs. My folder was in my Desktop named "Docs_for_GPT4all" and inside the folder all my docs in PDF. It seems that the GPT4all interface can't use this folder but start to index all the folders in my Desktop! So it was very slow.

2) So inside my "Docs_for_GPT4all" I create another sub-folder (eg. Research1_docs) with inside all my related docs.

3) re-launch the GPT4all to choose the local folder for Index...and it worked! In a few minutes I had the full index and I was able to ask questions and receive correct answers related to my folder docs.

Hope it helps.

ipodjupiter commented 10 months ago

Hi

I had the same issue when trying to install to a new machine. I actually forgot to follow the instructions 😅

after the above all wirks and index even if slow because of the number of pdf in my repository. I keep the Gpt4all folder under AppData open, and I can see the size of the dat file growing and the green bar slowly moving. I hope this helps

hhamud commented 6 months ago

has anyone found a work around?