microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
163k stars 28.78k forks source link

Copilot Chat `lmTools` API does not reliably use tool responses #225737

Closed benmcmorran closed 1 month ago

benmcmorran commented 1 month ago

Does this issue occur when all extensions are disabled?: No, depends on Copilot Chat and another extension to use the lmTools API proposal

Steps to Reproduce:

  1. Run npx --package yo --package generator-code -- yo code and accept all prompts to create a new TypeScript extension.
  2. Open the new extension folder in VS Code.
  3. Edit package.json to add these blocks:
  "enabledApiProposals": [
    "lmTools"
  ],
  "contributes": {
    <existing contribution points>,
    "languageModelTools": [
      {
          "id": "cpptools-lmtool-configuration",
          "name": "cpp",
          "displayName": "C/C++ configuration",
          "canBeInvokedManually": true,
          "userDescription": "Configuration of the active C or C++ file, like language standard version and target platform.",
          "modelDescription": "For the active C or C++ file, this tool provides: the language (C, C++, or CUDA), the language standard version (such as C++11, C++14, C++17, or C++20), the compiler (such as GCC, Clang, or MSVC), the target platform (such as x86, x64, or ARM), and the target architecture (such as 32-bit or 64-bit).",
          "icon": "$(file-code)",
          "parametersSchema": {}
      }
    ]
  },
  1. Run npx vscode-dts dev.
  2. Edit src/extension.ts to add the following:
    context.subscriptions.push(vscode.lm.registerTool('cpptools-lmtool-configuration', {
        async invoke(parameters, token) {
            return {
                toString() {
                    return "This C++ project uses the GCC compiler.";
                },
            }
        },
    }));
  1. Start debugging the extension.
  2. Open a C++ file. I used a file from z3, but I don't think the exact file will matter, and it shouldn't matter if the C/C++ extension is installed or correctly configured.
  3. In Copilot Chat, type "What compiler does this C++ project use? #cpp". This is designed to be a very direct question that should be exactly answered by the hardcoded tool response.
  4. GitHub Copilot responds with a verbose non-committal answer.

Image

Note that with verbose logging enabled, it's clear that GitHub Copilot did run the registered tool and include the tool response in the request.

[
    {
        "role": "system",
        <system prompt content>
    },
    {
        "role": "user",
        "content": "Excerpt from active file maxcore.cpp, lines 1 to 56:\n```cpp\n/*++\r\nCopyright (c) 2014 Microsoft Corporation\r\n\r\nModule Name:\r\n\r\n    maxcore.cpp\r\n\r\nAbstract:\r\n\r\n    Core based (weighted) max-sat algorithms:\r\n\r\n    - mu:         max-sat algorithm by Nina and Bacchus, AAAI 2014.\r\n    - mus-mss:    based on dual refinement of bounds.\r\n    - binary:     binary version of maxres\r\n    - rc2:        implementation of rc2 heuristic using cardinality constraints\r\n    - rc2t:       implementation of rc2 heuristic using totalizerx\r\n    - rc2-binary: hybrid of rc2 and binary maxres. Perform one step of binary maxres. \r\n                  If there are more than 16 soft constraints create a cardinality constraint.\r\n\r\n\r\n    MaxRes is a core-guided approach to maxsat.\r\n    MusMssMaxRes extends the core-guided approach by\r\n    leveraging both cores and satisfying assignments\r\n    to make progress towards a maximal satisfying assignment.\r\n\r\n    Given a (minimal) unsatisfiable core for the soft\r\n    constraints the approach works like max-res.\r\n    Given a (maximal) satisfying subset of the soft constraints\r\n    the approach updates the upper bound if the current assignment\r\n    improves the current best assignment.\r\n    Furthermore, take the soft constraints that are complements\r\n    to the current satisfying subset.\r\n    E.g, if F are the hard constraints and\r\n    s1,...,sn, t1,..., tm are the soft clauses and\r\n    F & s1 & ... & sn is satisfiable, then the complement\r\n    of of the current satisfying subset is t1, .., tm.\r\n    Update the hard constraint:\r\n         F := F & (t1 or ... or tm)\r\n    Replace t1, .., tm by m-1 new soft clauses:\r\n         t1 & t2, t3 & (t1 or t2), t4 & (t1 or t2 or t3), ..., tn & (t1 or ... t_{n-1})\r\n    Claim:\r\n       If k of these soft clauses are satisfied, then k+1 of\r\n       the previous soft clauses are satisfied.\r\n       If k of these soft clauses are false in the satisfying assignment\r\n       for the updated F, then k of the original soft clauses are also false\r\n       under the assignment.\r\n       In summary: any assignment to the new clauses that satisfies F has the\r\n       same cost.\r\n    Claim:\r\n       If there are no satisfying assignments to F, then the current best assignment\r\n       is the optimum.\r\n\r\nAuthor:\r\n\r\n    Nikolaj Bjorner (nbjorner) 2014-20-7\n```\n\n"
    },
    {
        "role": "user",
        "content": "What compiler does this C++ project use? #cpp "
    },
    {
        "role": "function",
        "content": "This C++ project uses the GCC compiler.",
        "name": "cpptools-lmtool-configuration"
    },
    {
        "role": "user",
        "content": "Above are the results of calling some functions. The user cannot see these results, so you should explain them to the user if referencing them in your answer."
    }
]

cc @isidorn @lukka @spebl @sinemakinci1 @esweet431

roblourens commented 1 month ago

I've seen similar behavior, and have been experimenting with this. I think there are two things that I need to do

I've been looking at the API part but I just realized this morning that the second point seems pretty impactful, so I will include the list of functions that have been used in the conversation thread, with function_call: 'none', that seems to give better results.

And yes internally we're also still on OpenAI's deprecated "functions" API, we will move to the "tools" API soon, maybe that will help too.

roblourens commented 1 month ago

I tried using a user message instead of a role=function message

Request: [
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  },
  {
    "role": "assistant",
    "content": "This C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "This C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  }
]
RESPONSE:
[
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "I can find out the compiler being used for this project. Please hold on for a moment while I retrieve the information.",
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": "stop"
  }
]

?????

roblourens commented 1 month ago

Anyway, we already label context messages for the old variables API, I had just hoped I could get away from that, but something like this is still needed.

Request: [
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  },
  {
    "role": "assistant",
    "content": "This C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "C/C++ Configuration Context:\nThis C++ project uses the GCC compiler."
  },
  {
    "role": "user",
    "content": "What compiler does this C++ project use?"
  }
]
RESPONSE:
[
  {
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Based on the provided configuration context, this C++ project uses the GCC compiler.",
      "refusal": null
    },
    "logprobs": null,
    "finish_reason": "stop"
  }
]

And another reason I won't go the function_call: 'none' route with function messages is that this is leading to the responses like I can find out the compiler being used for this project. Please hold on for a moment while I retrieve the information. claiming that it will call a tool when that's not allowed.