[FEATURE] Ollama backend for LLM responses

tomsmoker commented 1 month ago

What

Add Ollama in as an LLM option.

Why

Many people run local LLMs, and then knowledge table can be run without an internet connection.

Implementation guidance

The FastAPI backend is now modular, and there are docs on extending the LLM services. Structured output with Pydantic models is currently being used with OpenAI, so we'll probably need to add Outlines to run the same with local models.

cpfiffer commented 1 month ago

This is a long-standing issue with Ollama that they do not meaningfully support serious structured generation. We've asked them about it and offered to provide implementation support. I'd recommend upstreaming this to their neck of the woods. We can't really do much about it currently without support from the Ollama team.

An alternative that is moving much quicker in the structured generation for local runtimes is LM Studio, which provides servers + an OpenAI-compliant endpoint for structured generation. They support Outlines directly on Mac MLX, and should support everything else in the next few weeks.

The options you have:

Use vLLM, which has native Outlines support. It's hard to run this locally sometimes because it doesn't do Ollama/LM Studio's wonderful resource allocation.
Use LM Studio. I wrote docs on how to do this here.
Hand-roll llamacpp support by building outlines directly into . This is a pain in the ass and I would recommend against it.

alonsosilvaallende commented 1 month ago

@cpfiffer When you mention llamacpp, you mean llama-cpp-python? I don't see why would it be a pain?

cpfiffer commented 3 weeks ago

@cpfiffer When you mention llamacpp, you mean llama-cpp-python? I don't see why would it be a pain?

In this case I believe I was referring to the underlying cpp code, not the python library. As of now, I believe there's some discussion about building outlines into llama-cpp-python, which should make this less of an issue.

whyhow-ai / knowledge-table

[FEATURE] Ollama backend for LLM responses #23

What

Why

Implementation guidance