sgomez / ollama-ai-provider

Vercel AI Provider for running LLMs locally using Ollama
https://www.npmjs.com/package/ollama-ai-provider
Other
149 stars 18 forks source link

inconsistent responses in generateText with tools with openai #22

Closed louis030195 closed 2 months ago

louis030195 commented 3 months ago

Describe the bug A clear and concise description of what the bug is.

generateText(
  messages: [...],
  tools: {
    query_screenpipe: {
      description:
        "Query the local screenpipe instance for relevant information. You will return multiple queries under the key 'queries'.",
      // some zod schema
      parameters: screenpipeMultiQuery,
      // some function calling my api
      execute: queryScreenpipeNtimes,
    },
  },
  toolChoice: "required",
});
image

To Reproduce Steps to reproduce the behavior:

  1. use generateText with a tool
  2. console log the .toolCalls or .toolResults props (empty)

Expected behavior A clear and concise description of what you expected to happen.

properly fill the toolCalls and toolResults props

atm i have to JSON parse the text instead

on openai these props are properly filled

also the execute is not called at all

louis030195 commented 3 months ago

checked quickly your code but not so easy to find where are things implemented, if you have any direction to go to implement execute and debug the toolCalls and toolResults happy to send a PR

sgomez commented 3 months ago

Hi, @louis030195. You have to take into account that not all models support tools and that this is not deterministic. I have experimented with the ollama3.1:8b model and depending on the prompt sometimes the model would do function calling and sometimes not. I want to try to rent some machine with bigger models to see if the model fails less when doing function calling.

What model are you using? If you can create a repository with the minimum code that fails, I can investigate further. But with so little information I can't help.

louis030195 commented 3 months ago

update: issue not happening on llama3.1 (was using mistral-nemo or mistral which according to ollama support tools)

louis030195 commented 3 months ago

idk llama3.1 just cannot gen JSON:

{"queries":"[{"content_type": "all", "start_time": "2024-03-15T11:48:00Z", "end_time": "2024-03-15T11:49:00Z"}]"}. Error message: [ { "code": "invalid_type", "expected": "array", "received": "string", "path": [ "queries" ], "message": "Expected array, received string" } ]
sgomez commented 3 months ago

generateText uses doGenerate method in the ollama chat language model class.

If u can set a breakpoint after this line:

https://github.com/sgomez/ollama-ai-provider/blob/c19332e1da8fc4d6d410f4adabfe105571abf14c/packages/ollama/src/ollama-chat-language-model.ts#L169

the response variable contains the raw response from ollama API.

If the model recognized than it need to use a tool call u can see it in the response.

If the response is a regular assistant message, there are nothing we can do, except experiment with the prompt.

If the response is a tool call there are two cases:

  1. the arguments does not follow the json schema (vg the output of ollama3.1). Again, this is not a bug of the provider but a limitation of the model.
  2. the arguments follow the json schema, then there are a bug.

Unfortunately, this is a limitation of these models. I guess because there are too small (I guess u are also using ollama3.1:8b). Indeed, in my tests, I get worse results with mistral-nemo than with ollama3.1

You can check this example:

#! /usr/bin/env -S pnpm tsx

import { generateText, tool } from 'ai'
import { ollama } from 'ollama-ai-provider'
import { z } from 'zod'

import { buildProgram } from '../tools/command'

async function main(model: Parameters<typeof ollama>[0]) {
  const result = await generateText({
    maxTokens: 2000,
    model: ollama(model),
    prompt:
      'Generate 3 character descriptions for a fantasy role playing game.',
    tools: {
      generateCharacters: tool({
        parameters: z.object({
          characters: z.array(
            z.object({
              class: z
                .string()
                .describe('Character class, e.g. warrior, mage, or thief.'),
              description: z.string(),
              name: z.string(),
            }),
          ),
        }),
      }),
    },
  })

  console.log(JSON.stringify(result, null, 2))
}

buildProgram('llama3.1', main).catch(console.error)

With ollama3.1 I get this tool_call response:

  "message": {
    "content": "",
    "role": "assistant",
    "tool_calls": [
      {
        "function": {
          "arguments": {
            "characters": "[\"Warrior\", \"Mage\", \"Rogue\"]"
          },
          "name": "generateCharacters"
        }
      }
    ]
  },

The arguments does not match the schema.

With mistral-nemo is even worse:

  "message": {
    "content": "Okay, I'll generate three characters for you. What are the names of your characters?",
    "role": "assistant"
  },

It refuses to use the tool. If I force to use the tool in the prompt:

  "message": {
    "content": "[TOOL_CALLS][{\"name\": \"generateCharacters\", \"arguments\": {\"characters\": [{\"name\": \"Elara\"}, {\"name\": \"Thalion\"}, {\"name\": \"Gwendolyn\"}]}]",
    "role": "assistant"
  },

Seems like ollama is not able to generate the right answer

I discover there are problems when you ask for an array of objects.

Something I usually do to check Ollama’s behavior is to make the request directly to the Ollama API using an HTTP client

louis030195 commented 3 months ago

is json generation prompt engineered? i thought there are more reliable way to make this work like this:

https://github.com/mlc-ai/web-llm/blob/main/src/grammar.ts

sgomez commented 3 months ago

Yes, it is. You can see how it is injected in the template:

https://ollama.com/library/llama3.1/blobs/11ce4ee3e170

This template is used by ollama to add the tools to the conversation context and to explain to the model how to perform the tool call. If, in addition to the json schema instructions that are automatically entered, further instructions are manually added to the prompt, it could harm the model, which may be trained to expect instructions in a particular format.

As far as I know, 4 things are necessary for function calling to work:

  1. That the server (ollama) supports it, which at least in text generation it already does, although not in streaming. In my case I have managed to do it detecting if what arrives by the stream is a json.

  2. That the model supports it, as for example mistral-nemo and ollama3.1, at least the first one should, although it doesn't work well for me.

  3. That given the tools, the model infers that it must request a function call. The way to do it comes in the own system prompt of the model and if it does not do it, or it does not do it as the prompt says, ollama will not be able to return in the request that the answer is a function call.

  4. Finally, that the arguments that are received follow the schema indicated in the prompt.

It is enough that one of these things fails for the provider to not be able to work. And it is not something that the provider can fix. It is all a problem of the model and ollama. As I said before, I suspect that smaller models are not able to do the inference as it should, but I have not yet had the opportunity to test the larger ollama models to corroborate my theory.

I know there are some issues opened in Ollama related with enforcing json schema for tool arguments [see https://github.com/ollama/ollama/issues/6002]. Seems like something than need more work to do it as well as commercial platforms does.

I'm sorry, but unless there is something concrete that indicates that there is a bug in the provider when recognizing the API responses, my experience tells me that most likely the model is not able to do the inference well.

louis030195 commented 3 months ago

basically llama3.1 keep doing this error of adding " around arrays (not doing errors otherwise)

#! /usr/bin/env -S pnpm tsx

import { generateText, tool } from "ai";
import { ollama } from "ollama-ai-provider";
import { z } from "zod";

async function main(model: Parameters<typeof ollama>[0]) {
  const result = await generateText({
    maxTokens: 2000,
    model: ollama(model),
    prompt:
      "Generate 3 character descriptions for a fantasy role playing game. Make sure to respect the JSON schema (example: {class: 'warrior', description: 'A brave warrior', name: 'Aragorn'}).",
    tools: {
      generateCharacters: tool({
        parameters: z.object({
          characters: z.array(
            z.object({
              class: z
                .string()
                .describe("Character class, e.g. warrior, mage, or thief."),
              description: z.string(),
              name: z.string(),
            })
          ),
        }),
      }),
    },
  });

  console.log(JSON.stringify(result, null, 2));
}

main(process.argv[2] || "llama3.1");
   value: {
      characters: "[{class: 'wizard', description: 'A wise wizard', name: 'Gandalf'}, {class: 'rogue', description: 'A sneaky rogue', name: 'Legolas'}, {class: 'warrior', description: 'A fierce warrior', name: 'Arthas'}]"
    },
    [Symbol(vercel.ai.error)]: true,
    [Symbol(vercel.ai.error.AI_TypeValidationError)]: true
  },
  toolArgs: `{"characters":"[{class: 'wizard', description: 'A wise wizard', name: 'Gandalf'}, {class: 'rogue', description: 'A sneaky rogue', name: 'Legolas'}, {class: 'warrior', description: 'A fierce warrior', name: 'Arthas'}]"}`,
  toolName: 'generateCharacters',
  [Symbol(vercel.ai.error)]: true,
  [Symbol(vercel.ai.error.AI_InvalidToolArgumentsError)]: true
}

is there a way to fix the JSON before it's crashed by ollama-ai-provider? eg

function fixJsonArray(input: string): any {
  try {
    // First, try to parse the input as-is
    return JSON.parse(input);
  } catch (e) {
    // If parsing fails, attempt to fix the string
    const fixedString = input.replace(/^\[|\]$/g, '').replace(/'/g, '"');
    try {
      // Try to parse the fixed string as an array
      return JSON.parse(`[${fixedString}]`);
    } catch (e) {
      // If all attempts fail, return null or throw an error
      console.error("Failed to parse JSON:", input);
      return null;
    }
  }
}
louis030195 commented 3 months ago

wow interesting learning, this works:

#! /usr/bin/env -S pnpm tsx

import { generateObject } from "ai";
import { ollama } from "ollama-ai-provider";
import { z } from "zod";

async function main(model: Parameters<typeof ollama>[0]) {
  const result = await generateObject({
    maxTokens: 2000,
    model: ollama(model),
    prompt:
      "Generate 3 character descriptions for a fantasy role playing game.",
    schema: z.object({
      characters: z.array(
        z.object({
          class: z
            .string()
            .describe("Character class, e.g. warrior, mage, or thief."),
          description: z.string(),
          name: z.string(),
        })
      ),
    }),
  });

  console.log(JSON.stringify(result, null, 2));
}

main(process.argv[2] || "llama3.1");
(base) louisbeaumont@louisbeaumontme-macbook:~/Documents/ollama-test$ ./main.ts
{
  "object": {
    "characters": [
      {
        "class": "Warrior",
        "description": "A skilled fighter from the mountains, known for their bravery and unwavering dedication to justice.",
        "name": "Grimgold Ironfist"
      },
      {
        "class": "Mage",
        "description": "A mysterious sorceress with a talent for elemental magic, feared by her enemies for her ability to summon powerful storms.",
        "name": "Lyra Moonwhisper"
      },
      {
        "class": "Thief",
        "description": "A cunning rogue from the city streets, adept at slipping in and out of shadows unnoticed, with a reputation for stealing valuable treasures.",
        "name": "Arin Swiftfoot"
      }
    ]
  },
  "finishReason": "stop",
  "usage": {
    "promptTokens": 138,
    "completionTokens": 162,
    "totalTokens": 300
  },
  "warnings": [],
  "rawResponse": {
    "headers": {
      "content-length": "1020",
      "content-type": "application/json; charset=utf-8",
      "date": "Thu, 15 Aug 2024 11:55:11 GMT"
    }
  }
}

this works perfect, WHY? lol

maybe there is an issue with generateText ?

sgomez commented 2 months ago

generateObject adds the "format": "json" parameter to the call API. See Advanced parameters.

Maybe that is the reason than generateObject works better than generateText. But this parameter cannot be set with generateText. Because obviously, we want text and not JSON data.

sgomez commented 2 months ago

I did this test with a larger model:

https://github.com/sgomez/ollama-ai-provider/blob/main/examples/ai-core/src/complex/math-agent/agent.ts

Using firefunction-v2, a model with 70.6B parameters, the agent was able to utilize the tool and answer the question. However, smaller models like llama3.1:8b and mistral-nemo:12b failed all the times.

I have a 3060Ti GPU with 32GB of RAM, and by disabling the graphics manager in Linux (running purely in terminal), I was able to run the model using a combination of GPU and CPU. The inference wasn't the fastest, taking around 5 minutes, but it did work.

Therefore, the issue with tooling isn't related to Ollama or the provider. It's simply that there aren't any small models capable of effective tooling inference.