Open iplayfast opened 5 months ago
One possible solution for you, that works well for me... is define a schema to represent your function's arguments, and pass that into the system message, get json formatted response using json mode in the API. Then treat the response you get (don't recommend streaming the json) as the input data you can then use to match to the function you want to invoke given the data.
It's not the same approach that OpenAI uses, but it's similar and does work.
Function calling makes a big difference.
It would be perfect if ollama could support function calling
An approach to run multiple functions with Ollama: https://github.com/eliranwong/freegenius#approach-to-run-function-calling-equivalent-features-offline-with-common-hardwares I like Ollama and will use ollama as the default engine for the project.
I have been able to get function calling (openai style) to work with dolphin-mistral:8x7b-v2.7-q8_0, probably works with others too but probably not them all. Here's the prompt I send:
To respond to the users message, you have access to the following tools:
{
"name": "duckduckgo_search",
"description": "Use this function to search DuckDuckGo for a query.\n\nArgs:\n
query(str): The query to search for.\n max_results (optional, default=5): The
maximum number of results to return.\n\nReturns:\n The result from DuckDuckGo.",
"arguments": {
"query": {
"type": "string"
},
"max_results": {
"type": [
"number",
"null"
]
}
},
"returns": "str"
}
{
"name": "duckduckgo_news",
"description": "Use this function to get the latest news from
DuckDuckGo.\n\nArgs:\n query(str): The query to search for.\n max_results
(optional, default=5): The maximum number of results to return.\n\nReturns:\n The
latest news from DuckDuckGo.",
"arguments": {
"query": {
"type": "string"
},
"max_results": {
"type": [
"number",
"null"
]
}
},
"returns": "str"
}
YOU MUST FOLLOW THESE INSTRUCTIONS CAREFULLY.
<instructions>
1. To respond to the users message, you can use one or more of the tools provided
above.
2. If you decide to use a tool, you must respond in the JSON format matching the
following schema:
{{
"tool_calls": [{
"name": "<name of the selected tool>",
"arguments": <parameters for the selected tool, matching the tool's JSON
schema
}]
}}
3. To use a tool, just respond with the JSON matching the schema. Nothing else. Do
not add any additional notes or explanations
4. After you use a tool, the next message you get will contain the result of the tool
call.
5. REMEMBER: To use a tool, you must respond only in JSON format.
6. After you use a tool and receive the result back, respond regularly to answer the
users question.
7. Only use the tools you are provided.
8. Use markdown to format your answers.
</instructions>
============== user ==============
What's the weather like in Toronto?
which causes output from the model like this (for illustrative purposes):
Building tool calls from [{'name': 'duckduckgo_search', 'arguments': {'query':
'weather in Toronto'}}]
============== assistant ==============
{"tool_calls": [
{
"name": "duckduckgo_search",
"arguments": {
"query": "weather in Toronto"
}
}
]}
Hope that helps illustrate that at least with some models, you can do it directly with prompting and json mode.
It is helpful, much appreciated
I have been able to get function calling (openai style) to work with dolphin-mistral:8x7b-v2.7-q8_0, probably works with others too but probably not them all. Here's the prompt I send:
To respond to the users message, you have access to the following tools: { "name": "duckduckgo_search",
Hope that helps illustrate that at least with some models, you can do it directly with prompting and json mode.
This example works in some models. However, when I try to add more than 5 actions or above, the waiting time is getting terrible.
I have a project https://github.com/eliranwong/letmedoit uses ChatGPT by default for function calling among around 40 functions in one go, without an issue. I am working on how to use Ollama instead. So far, waiting time for a single prompt of 40 functions is not acceptable at all. I find a little hope to split the process in steps for generations.
It seems this workaround works with better speed: https://github.com/eliranwong/freegenius#approach-to-run-function-calling-equivalent-features-offline-with-common-hardwares
I am testing with "phi", "mistral" and "llama2" via Ollama
... still in testing, though ...
i just used instructor. It's a bit of a pain to setup, but once you do. It works without issues everytime. https://jxnl.github.io/instructor/
@jquintanilla4 I was inspired by instructor and tried to create a similar library just for Ollama. It is still missing some features like supporting async and I haven't tried if it works with images, but I am working on that and the documentation. Here is the link to the repo: ollama-instructor Would be cool to have some feedback, if it would work for you projects as well
I keep seeing that openai has function calling [https://platform.openai.com/docs/api-reference/chat/create] (now called tooltypes) and some open source llm also support function calling.
This is done by having the models fine tuned to understand when they need to call a function. As we don't have that ability (as far as I know) maybe we could emulate it by adding a layer between ollama and the api, so the api can be added to.
So calling ollama.tools() will return what tools are available. ollama.addtool(name="string",prompt,function)
addtool would add to a list of tools the name and the prompt to recognize and the response it should have.. For instance to do an internet search ollama.addtool("BrowseWeb","If the answer needs to access html pages return 'BrowseWeb:url'",BrowseWeb) def BrowseWeb(url): return httpx(url)
This is probably not the best solution, but I thought I would make it an issue to see if any discussion will come from it.