Open tomsmoker opened 1 month ago
This is a long-standing issue with Ollama that they do not meaningfully support serious structured generation. We've asked them about it and offered to provide implementation support. I'd recommend upstreaming this to their neck of the woods. We can't really do much about it currently without support from the Ollama team.
An alternative that is moving much quicker in the structured generation for local runtimes is LM Studio, which provides servers + an OpenAI-compliant endpoint for structured generation. They support Outlines directly on Mac MLX, and should support everything else in the next few weeks.
The options you have:
@cpfiffer When you mention llamacpp, you mean llama-cpp-python? I don't see why would it be a pain?
@cpfiffer When you mention llamacpp, you mean llama-cpp-python? I don't see why would it be a pain?
In this case I believe I was referring to the underlying cpp code, not the python library. As of now, I believe there's some discussion about building outlines into llama-cpp-python, which should make this less of an issue.
What
Add Ollama in as an LLM option.
Why
Many people run local LLMs, and then knowledge table can be run without an internet connection.
Implementation guidance
The FastAPI backend is now modular, and there are docs on extending the LLM services. Structured output with Pydantic models is currently being used with OpenAI, so we'll probably need to add Outlines to run the same with local models.