jonfairbanks / local-rag

Ingest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive data leaving your network.
GNU General Public License v3.0
448 stars 52 forks source link

Inconsistent Document Recognition and Indexing in Application #61

Open casualcomputer opened 1 week ago

casualcomputer commented 1 week ago

Issue The application fails to recognize newly uploaded documents despite logs showing that the documents are indexed.

Troubleshooting I initiated the user interface and adjusted the settings to retrieve up to 10 documents per process, an increase from the default setting of three. I then uploaded two separate batches of documents for processing. The first batch contained four documents. Post-upload, I queried the LLM (LLAMA3-8B from ollama) about the number of documents uploaded, and it correctly identified all four, which matched my expectations and was confirmed by visible text pre-processing in the logs. I further validated this by requesting summaries of these four documents, which the LLM accurately provided.

Subsequently, I uploaded a second batch consisting of 10 documents. Unlike the first batch, the log indexed these documents swiftly but did not display the pre-processing progress bar observed with the previous upload. When I asked the LLM how many documents had been uploaded, it still responded with the original four. To verify the presence of the new documents, I referenced them by name in my query (these new documents are present in the "data" folder as well), but the LLM did not recognize any of the newly uploaded documents.

How To Reproduce Launch the application UI. Change the setting to retrieve 10 documents per process (default is 3). Upload the first batch of 4 documents: Observe and confirm via the LLM query and logs that 4 documents are processed and indexed. Upload a second batch of 10 documents: Notice the absence of the pre-processing progress bar. Query the system for the count of uploaded documents; it incorrectly reports only the initial 4 documents.

Expected Behavior The application should index each new batch of documents and update the document count accordingly. The pre-processing progress bar should appear for each batch, indicating that processing is occurring. Queries about the document count should reflect the total number of documents successfully uploaded and processed.

Desktop (please complete the following information):

jonfairbanks commented 5 days ago

Please checkout the Troubleshooting Guide. See if all of the documents are making it into the documents state.

This may be similar to #48. Streamlit really wants things to run from top to bottom and doesn't like partial changes. The Streamlit cache setup is currently a large pain point for the current setup.