Closed wolfspyre closed 8 months ago
not sure if you WANT me to create an issue for the overlap, as you've already got a todo about it in llama_index.py, but the chunk overlap entry in service_context is commented out.... maybe this is intentional?
service_context = ServiceContext.from_defaults(
llm=llm,
system_prompt=system_prompt,
embed_model=embedding_model,
chunk_size=int(chunk_size),
# chunk_overlap=int(chunk_overlap),
)
There is a relationship between chunk_size and chunk_overlap. Tweaking chunk sizes currently needs to be done with some care. Additional logic is needed on the UI side to better help users know when their settings may cause errors.
As for the error about reaching to OpenAI, this is somewhat of a bad error message coming out of Llama Index. Setting up the embed model with the provided values failed, so it starts trying to use OpenAI (its default).
my intent was to have a large overlap, and having lots of small files to help with small context... but after seeing the 200 hardcoded there I switched back to 1024 as chunksize... which didn't fix the problem I'm seeing:
with the model: jaigouk/nous-capybara-34b-q3:latest
and endpoint set to http://myotherhost:11434
resetting the embedding to bge-large-en-v1.5, or using salesforce/sfr-embedding-mistral with the chunksize at 1024/overlap 200
adding a 1mb markdown file (the hugo documentation)
2024-03-18 17:47:39,244 - helpers - INFO - Directory ~/ML_AI/local-rag/data did not exist so creating it
2024-03-18 17:47:39,250 - helpers - INFO - Upload README.md saved to disk
2024-03-18 17:47:39,253 - llama_index - INFO - Service Context created successfully
2024-03-18 17:47:39,255 - rag_pipeline - INFO - Documents are already available; skipping document loading
and attempting to chat with the info just results in a response of
'Please confirm settings and upload files before proceeding'
I know that local-rag has connected, as I see gets of '/api/tags' but I never see a post to 'api/chat' from local-rag...
nothing interesting is logged in local-rag.log
suggestions?
A possible contributor could be the lack of cuda ... as I'm running on macs... I haz no nvidia gear.. so with the included lockfile, the pipenv install
doesn't succeed... however it does if you remove the lockfile and let pipenv re-sort it's deps.
I'm trying the same process now on the machine that's running ollama locally (mbp m2 96g) to see if it behaves differently (tho I doubt it'll change anything)
Under Settings > Advanced, at the bottom you should be able to view the Application State.
Here you will need a few elements to successfully chat:
documents
- uploaded files/repos/websites are processed into documentsquery_engine
- a vectorized index of uploaded documents; will be used to chat with directlyservice_context
- holds configuration details regarding the embedding model, system prompt, etc.Initially these will be NULL
, but should have values after document processing. If any of these items are still NULL
after processing, it may indicate that an error was encountered when loading documents, setting up the embedding model, or transforming the documents using the embedding model.
Sample View:
A possible contributor could be the lack of cuda ... as I'm running on macs... I haz no nvidia gear.. so with the included lockfile, the
pipenv install
doesn't succeed... however it does if you remove the lockfile and let pipenv re-sort it's deps.I'm trying the same process now on the machine that's running ollama locally (mbp m2 96g) to see if it behaves differently (tho I doubt it'll change anything)
Not having cuda should not cause issues here as the default is to use cpu unless cuda is available. I'll work to update the Pipfile however to ensure smoother installation for Mac users. Thanks for the note.
well... I'm getting a response (or at least it's not behaving the same) on the MBP m2 with ollama running locally.... so I'm gonna try again once this host finishes indexing the document (4h to index a 1.5mb md file... I find it hard to understand why it would take so long... but that's orthogonal to this issue... so I'mma drop the 'please confirm settings' thing until I can narrow down a why )
Do you have a link to the MD file you are using? I can use it in some testing.
Next release will migrate the way settings are applied in the backend and should prevent odd OpenAI related errors.
will create another issue for the chunk overlap not being propagated from the UI ... but
my scenario is I have ollama running on an adjacent host. and am running local-rag on my desktop. upon opening it, I added a website to be indexed, after validating that local-rag saw my ollama instance and selecting the embedding model, and changing chunk size and overlap to 128 and 16 respectively for shiggles.
that.... doesn't seem like its spozed to do that, right? :) (what info would be helpful here?)