Open nasirus opened 1 year ago
Thank you for bringing this issue to our attention. We have identified the issue and have a solution for you.
The issue is that the service_context parameter is missing from the GPTSimpleVectorIndex.from_documents() call. This parameter is required to use the LlamaCpp model locally.
The code should be updated to the following:
llm_predictor = LLMPredictor(llm = LlamaCpp(model_path="~/Code/llama.cpp/models/30B/ggml-model-q4_0.bin", n_threads=10)) service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
ObsidianReader = download_loader('ObsidianReader') documents = ObsidianReader('~/Documents/Obsidian').load_data()
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
print(index.query("Any query here"))
Additionally, you may need to adjust the internal prompts to get good performance. A list of all default internal prompts is available here, and chat-specific prompts are listed here. You can also implement your own custom prompts, as described here.
We hope this helps! If you have any further questions or need more assistance, please let us know.
Best, The LlamaIndex Team
The following code appears to load the llamacpp model properly, but it just ramps up the CPU load and hangs for hours if allowed. If service_context=service_context is removed from GPTSimpleVectorIndex.from_documents() then it uses OpenAI's api and works fine. What step is missing here to run llama locally? It outputs all the debug text like loading llamacpp normally does, so it is loading it.