Closed louis030195 closed 2 months ago
checked quickly your code but not so easy to find where are things implemented, if you have any direction to go to implement execute
and debug the toolCalls
and toolResults
happy to send a PR
Hi, @louis030195. You have to take into account that not all models support tools and that this is not deterministic. I have experimented with the ollama3.1:8b model and depending on the prompt sometimes the model would do function calling and sometimes not. I want to try to rent some machine with bigger models to see if the model fails less when doing function calling.
What model are you using? If you can create a repository with the minimum code that fails, I can investigate further. But with so little information I can't help.
update: issue not happening on llama3.1 (was using mistral-nemo or mistral which according to ollama support tools)
idk llama3.1 just cannot gen JSON:
{"queries":"[{"content_type": "all", "start_time": "2024-03-15T11:48:00Z", "end_time": "2024-03-15T11:49:00Z"}]"}. Error message: [ { "code": "invalid_type", "expected": "array", "received": "string", "path": [ "queries" ], "message": "Expected array, received string" } ]
generateText uses doGenerate
method in the ollama chat language model class.
If u can set a breakpoint after this line:
the response variable contains the raw response from ollama API.
If the model recognized than it need to use a tool call u can see it in the response.
If the response is a regular assistant message, there are nothing we can do, except experiment with the prompt.
If the response is a tool call there are two cases:
Unfortunately, this is a limitation of these models. I guess because there are too small (I guess u are also using ollama3.1:8b). Indeed, in my tests, I get worse results with mistral-nemo than with ollama3.1
You can check this example:
#! /usr/bin/env -S pnpm tsx
import { generateText, tool } from 'ai'
import { ollama } from 'ollama-ai-provider'
import { z } from 'zod'
import { buildProgram } from '../tools/command'
async function main(model: Parameters<typeof ollama>[0]) {
const result = await generateText({
maxTokens: 2000,
model: ollama(model),
prompt:
'Generate 3 character descriptions for a fantasy role playing game.',
tools: {
generateCharacters: tool({
parameters: z.object({
characters: z.array(
z.object({
class: z
.string()
.describe('Character class, e.g. warrior, mage, or thief.'),
description: z.string(),
name: z.string(),
}),
),
}),
}),
},
})
console.log(JSON.stringify(result, null, 2))
}
buildProgram('llama3.1', main).catch(console.error)
With ollama3.1 I get this tool_call response:
"message": {
"content": "",
"role": "assistant",
"tool_calls": [
{
"function": {
"arguments": {
"characters": "[\"Warrior\", \"Mage\", \"Rogue\"]"
},
"name": "generateCharacters"
}
}
]
},
The arguments does not match the schema.
With mistral-nemo is even worse:
"message": {
"content": "Okay, I'll generate three characters for you. What are the names of your characters?",
"role": "assistant"
},
It refuses to use the tool. If I force to use the tool in the prompt:
"message": {
"content": "[TOOL_CALLS][{\"name\": \"generateCharacters\", \"arguments\": {\"characters\": [{\"name\": \"Elara\"}, {\"name\": \"Thalion\"}, {\"name\": \"Gwendolyn\"}]}]",
"role": "assistant"
},
Seems like ollama is not able to generate the right answer
I discover there are problems when you ask for an array of objects.
Something I usually do to check Ollama’s behavior is to make the request directly to the Ollama API using an HTTP client
is json generation prompt engineered? i thought there are more reliable way to make this work like this:
Yes, it is. You can see how it is injected in the template:
https://ollama.com/library/llama3.1/blobs/11ce4ee3e170
This template is used by ollama to add the tools to the conversation context and to explain to the model how to perform the tool call. If, in addition to the json schema instructions that are automatically entered, further instructions are manually added to the prompt, it could harm the model, which may be trained to expect instructions in a particular format.
As far as I know, 4 things are necessary for function calling to work:
That the server (ollama) supports it, which at least in text generation it already does, although not in streaming. In my case I have managed to do it detecting if what arrives by the stream is a json.
That the model supports it, as for example mistral-nemo and ollama3.1, at least the first one should, although it doesn't work well for me.
That given the tools, the model infers that it must request a function call. The way to do it comes in the own system prompt of the model and if it does not do it, or it does not do it as the prompt says, ollama will not be able to return in the request that the answer is a function call.
Finally, that the arguments that are received follow the schema indicated in the prompt.
It is enough that one of these things fails for the provider to not be able to work. And it is not something that the provider can fix. It is all a problem of the model and ollama. As I said before, I suspect that smaller models are not able to do the inference as it should, but I have not yet had the opportunity to test the larger ollama models to corroborate my theory.
I know there are some issues opened in Ollama related with enforcing json schema for tool arguments [see https://github.com/ollama/ollama/issues/6002]. Seems like something than need more work to do it as well as commercial platforms does.
I'm sorry, but unless there is something concrete that indicates that there is a bug in the provider when recognizing the API responses, my experience tells me that most likely the model is not able to do the inference well.
basically llama3.1 keep doing this error of adding " around arrays (not doing errors otherwise)
#! /usr/bin/env -S pnpm tsx
import { generateText, tool } from "ai";
import { ollama } from "ollama-ai-provider";
import { z } from "zod";
async function main(model: Parameters<typeof ollama>[0]) {
const result = await generateText({
maxTokens: 2000,
model: ollama(model),
prompt:
"Generate 3 character descriptions for a fantasy role playing game. Make sure to respect the JSON schema (example: {class: 'warrior', description: 'A brave warrior', name: 'Aragorn'}).",
tools: {
generateCharacters: tool({
parameters: z.object({
characters: z.array(
z.object({
class: z
.string()
.describe("Character class, e.g. warrior, mage, or thief."),
description: z.string(),
name: z.string(),
})
),
}),
}),
},
});
console.log(JSON.stringify(result, null, 2));
}
main(process.argv[2] || "llama3.1");
value: {
characters: "[{class: 'wizard', description: 'A wise wizard', name: 'Gandalf'}, {class: 'rogue', description: 'A sneaky rogue', name: 'Legolas'}, {class: 'warrior', description: 'A fierce warrior', name: 'Arthas'}]"
},
[Symbol(vercel.ai.error)]: true,
[Symbol(vercel.ai.error.AI_TypeValidationError)]: true
},
toolArgs: `{"characters":"[{class: 'wizard', description: 'A wise wizard', name: 'Gandalf'}, {class: 'rogue', description: 'A sneaky rogue', name: 'Legolas'}, {class: 'warrior', description: 'A fierce warrior', name: 'Arthas'}]"}`,
toolName: 'generateCharacters',
[Symbol(vercel.ai.error)]: true,
[Symbol(vercel.ai.error.AI_InvalidToolArgumentsError)]: true
}
is there a way to fix the JSON before it's crashed by ollama-ai-provider? eg
function fixJsonArray(input: string): any {
try {
// First, try to parse the input as-is
return JSON.parse(input);
} catch (e) {
// If parsing fails, attempt to fix the string
const fixedString = input.replace(/^\[|\]$/g, '').replace(/'/g, '"');
try {
// Try to parse the fixed string as an array
return JSON.parse(`[${fixedString}]`);
} catch (e) {
// If all attempts fail, return null or throw an error
console.error("Failed to parse JSON:", input);
return null;
}
}
}
wow interesting learning, this works:
#! /usr/bin/env -S pnpm tsx
import { generateObject } from "ai";
import { ollama } from "ollama-ai-provider";
import { z } from "zod";
async function main(model: Parameters<typeof ollama>[0]) {
const result = await generateObject({
maxTokens: 2000,
model: ollama(model),
prompt:
"Generate 3 character descriptions for a fantasy role playing game.",
schema: z.object({
characters: z.array(
z.object({
class: z
.string()
.describe("Character class, e.g. warrior, mage, or thief."),
description: z.string(),
name: z.string(),
})
),
}),
});
console.log(JSON.stringify(result, null, 2));
}
main(process.argv[2] || "llama3.1");
(base) louisbeaumont@louisbeaumontme-macbook:~/Documents/ollama-test$ ./main.ts
{
"object": {
"characters": [
{
"class": "Warrior",
"description": "A skilled fighter from the mountains, known for their bravery and unwavering dedication to justice.",
"name": "Grimgold Ironfist"
},
{
"class": "Mage",
"description": "A mysterious sorceress with a talent for elemental magic, feared by her enemies for her ability to summon powerful storms.",
"name": "Lyra Moonwhisper"
},
{
"class": "Thief",
"description": "A cunning rogue from the city streets, adept at slipping in and out of shadows unnoticed, with a reputation for stealing valuable treasures.",
"name": "Arin Swiftfoot"
}
]
},
"finishReason": "stop",
"usage": {
"promptTokens": 138,
"completionTokens": 162,
"totalTokens": 300
},
"warnings": [],
"rawResponse": {
"headers": {
"content-length": "1020",
"content-type": "application/json; charset=utf-8",
"date": "Thu, 15 Aug 2024 11:55:11 GMT"
}
}
}
this works perfect, WHY? lol
maybe there is an issue with generateText
?
generateObject adds the "format": "json"
parameter to the call API. See Advanced parameters.
Maybe that is the reason than generateObject works better than generateText. But this parameter cannot be set with generateText. Because obviously, we want text and not JSON data.
I did this test with a larger model:
Using firefunction-v2, a model with 70.6B parameters, the agent was able to utilize the tool and answer the question. However, smaller models like llama3.1:8b and mistral-nemo:12b failed all the times.
I have a 3060Ti GPU with 32GB of RAM, and by disabling the graphics manager in Linux (running purely in terminal), I was able to run the model using a combination of GPU and CPU. The inference wasn't the fastest, taking around 5 minutes, but it did work.
Therefore, the issue with tooling isn't related to Ollama or the provider. It's simply that there aren't any small models capable of effective tooling inference.
Describe the bug A clear and concise description of what the bug is.
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
properly fill the
toolCalls
andtoolResults
propsatm i have to JSON parse the text instead
on openai these props are properly filled
also the
execute
is not called at all