jataware / archycoder

demo of interactive programming with GPT-4
1 stars 0 forks source link

change the expected format for LLM to specify code changes #7

Open david-andrew opened 1 year ago

david-andrew commented 1 year ago

Currently the LLM is instructed to use a json object to specify code changes, e.g.

{
    "code": "\ndef divide(a, b):\n    return a / b",
    "start": 6,
    "end": 6
}

However GPT-4 doesn't do a very good job with this format:

So I propose a new format for code inserts that takes advantage of this. insert_at: 5 overwrite_lines: 2 ```language \ ```

And then parsing should also be easier. If anywhere in the LLM output we see: insert_at: \ overwrite_lines: \ ```\ | ϵ \ ```

Then we know that is an edit (noting that whitespace should match, so that code edits may also contain ```). Having the LLM specify how many lines to overwrite, rather than where the overwriting should end (which is confusing with inclusive vs exclusive range bounds), should be much less error prone.

david-andrew commented 1 year ago

Actually now with the new function interface, it makes a lot of sense to use that directly https://openai.com/blog/function-calling-and-other-api-updates