change the expected format for LLM to specify code changes

Currently the LLM is instructed to use a json object to specify code changes, e.g.

{
    "code": "\ndef divide(a, b):\n    return a / b",
    "start": 6,
    "end": 6
}

However GPT-4 doesn't do a very good job with this format:

usually messes up the ending index, often not including the last line that would be replaced (e.g. closing braces, etc.)
really likes the markdown format for writing code: ```language \ ```

So I propose a new format for code inserts that takes advantage of this. insert_at: 5 overwrite_lines: 2 ```language \ ```

And then parsing should also be easier. If anywhere in the LLM output we see: insert_at: \ overwrite_lines: \ ```\ | ϵ \ ```

Then we know that is an edit (noting that whitespace should match, so that code edits may also contain ```). Having the LLM specify how many lines to overwrite, rather than where the overwriting should end (which is confusing with inclusive vs exclusive range bounds), should be much less error prone.

jataware / archycoder

change the expected format for LLM to specify code changes #7