liusida / gym-codecraft

An gym-like environment with a docker container sandbox for the agent (including LLMs) to learn to code.
MIT License
2 stars 0 forks source link

function_call is not reliable. #1

Closed liusida closed 1 year ago

liusida commented 1 year ago

If we call chat completion like this:

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call="auto",  # auto is default, but we'll be explicit
    )

It might sometimes reply without a function call, e.g.:

2023-06-17 15:45:13,193 - INFO - Task Messages: [{'role': 'assistant', 'content': '{"action": "reset", "task_id": "1"}'}, {'role': 'user', 'content': "Task 1:\n {'category': 'Python', 'docker': 'python:alpine3.18', 'shell': '/bin/sh', 'working_dir': '/tmp', 'title': 'Read and comprehend', 'description': 'Install git, clone the repo github.com/liusida/gym-codecraft, read all the files in the project and write a commentary for the project. The commentary should be in the project root directory, named `commentary.md`.', 'test': 'Check the commentary is in the project root directory.'}\n\nWhat's your next action? (respond with a function call)"}, {'role': 'assistant', 'content': '{"action": "command", "command": "apk add git && git clone https://github.com/liusida/gym-codecraft && cd gym-codecraft && ls -R > /tmp/commentary.md"}'}, {'role': 'user', 'content': "fetch https://dl-cdn.alpinelinux.org/alpine/v3.18/main/x86_64/APKINDEX.tar.gz\nfetch https://dl-cdn.alpinelinux.org/alpine/v3.18/community/x86_64/APKINDEX.tar.gz\n(1/7) Installing brotli-libs (1.0.9-r14)\n(2/7) Installing libunistring (1.1-r1)\n(3/7) Installing libidn2 (2.3.4-r1)\n(4/7) Installing nghttp2-libs (1.53.0-r0)\n(5/7) Installing libcurl (8.1.2-r0)\n(6/7) Installing pcre2 (10.42-r1)\n(7/7) Installing git (2.40.1-r0)\nExecuting busybox-1.36.1-r0.trigger\nOK: 25 MiB in 45 packages\nCloning into 'gym-codecraft'...\n\nWhat's your next action? (respond with a function call)"}, {'role': 'assistant', 'content': '{"action": "command", "command": "cd gym-codecraft && ls -R > /tmp/commentary.md"}'}, {'role': 'user', 'content': "\nWhat's your next action? (respond with a function call)"}, {'role': 'assistant', 'content': '{"action": "write_file", "content": "I have installed git and cloned the gym-codecraft repository. The project files are located in the gym-codecraft directory.\\n", "path": "/tmp/commentary.md"}'}, {'role': 'user', 'content': "File /tmp/commentary.md written.\nWhat's your next action? (respond with a function call)"}, {'role': 'assistant', 'content': '{"action": "submit"}'}, {'role': 'user', 'content': "Code submitted.\nWhat's your next action? (respond with a function call)"}]
2023-06-17 15:45:14,381 - INFO - Chat completion: {
  "id": "chatcmpl-7SKuQQLuAMZdMBESvxNicTntzXNCr",
  "object": "chat.completion",
  "created": 1686987914,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"action\": \"exit\"}\n{\"action\": \"reset\", \"task_id\": \"2\"}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1132,
    "completion_tokens": 21,
    "total_tokens": 1153
  }
}
2023-06-17 15:45:14,381 - ERROR - Error: function_call not found. {
  "role": "assistant",
  "content": "{\"action\": \"exit\"}\n{\"action\": \"reset\", \"task_id\": \"2\"}"
}. Exit.

However, there's no way to say that I want the model to choose one function over a set of functions without speaking directly. I can only tell the model to only use one specific function, like this:

    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0613",
        messages=messages,
        functions=functions,
        function_call={"name": "reset"},
    )

I guess the function call is done by generating some prompt for the model, so maybe I'll just generate prompt myself?

liusida commented 1 year ago

0be177740bb4681460751504fa618b8fca654c7c solves this.