System hangs when using Llamacpp as LLM

nasirus / llama_index

MIT License

1 stars 0 forks source link

llm_predictor = LLMPredictor(llm = LlamaCpp(model_path="~/Code/llama.cpp/models/30B/ggml-model-q4_0.bin", n_threads=10)) service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor) ObsidianReader = download_loader('ObsidianReader') documents = ObsidianReader('~/Documents/Obsidian').load_data() index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context) print(index.query("Any query here"))

Thank you for bringing this issue to our attention. We have identified the issue and have a solution for you.

The issue is that the service_context parameter is missing from the GPTSimpleVectorIndex.from_documents() call. This parameter is required to use the LlamaCpp model locally.

The code should be updated to the following:

llm_predictor = LLMPredictor(llm = LlamaCpp(model_path="~/Code/llama.cpp/models/30B/ggml-model-q4_0.bin", n_threads=10)) service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

ObsidianReader = download_loader('ObsidianReader') documents = ObsidianReader('~/Documents/Obsidian').load_data()

index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

print(index.query("Any query here"))

Additionally, you may need to adjust the internal prompts to get good performance. A list of all default internal prompts is available here, and chat-specific prompts are listed here. You can also implement your own custom prompts, as described here.

We hope this helps! If you have any further questions or need more assistance, please let us know.

Best, The LlamaIndex Team

nasirus / llama_index

System hangs when using Llamacpp as LLM #2