Closed vlbosch closed 2 weeks ago
I saw that Outline supports mlx guided decoding, I think the next step is to make it available in this server engine first (like how vllm works), and then function calling should not be too hard to implement. It is safer to do it with guided decoding instead of letting the model generate freely in my opinion. I would be happy to collaborate on that.
@nath1295 I was working on Outline support, but saw you beat me too it and already merged Outline support. Great work! I was wondering whether you're already busy with tool/function calling too? Before I start working on that as well. I don't mind if you do or are about to start on it, because my time is limited and I'm still getting to know your codebase.
@vlbosch I still haven’t started on tool calling as I am still thinking about how to make it work with different prompt templates. If you have any questions about the codebase, I am happy to answer them! Would really appreciate if you already have any ideas on making tool calling work and if you have started working on it, I would like to see your PR or code snippets as well.
I think the current problem with integrating tool calling is that the code might not support any random format of chat templates. To solve that, I think we have two options:
Of course, if you have any better ideas, please let me know. Thanks for taking a deep dive into my code!
Tool calling is now supported with the latest update v0.1.0
along with batch inference. Closing this issue.
Recently MLX got support for function calling. The output must still be parsed manually, so it's not OpenAI-compliant. Supporting the OpenAI HTTP server specification could be a feature that sets this project apart from the rest. Like a drop in replacement with local models. Would be happy to help you implement it.