zed-industries / zed

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
https://zed.dev
Other
50.15k stars 3.08k forks source link

inline code completion frequently leaves a `{{INSERTED_CODE}}` artifact at the top of the inserted code #19471

Open kurtbuilds opened 4 weeks ago

kurtbuilds commented 4 weeks ago

Check for existing issues

Describe the bug / provide steps to reproduce it

When doing code completions (with the inline assistant), I frequently get results with that token as an artifact. See example inline assistant output below.

I asked inline assistant to write a short shell script. The result is this:

{{INSERTED_CODE}}
#!/bin/bash

# URL to fetch crates data
BASE_URL="https://crates.io/api/v1/crates?sort=recent-downloads"
OUTPUT_FILE="crates_list.txt"
... (rest of the script)

Both the `{{INSERTED_CODE}}`` and the markdown backtick block shouldn't be in the results.

This is using Github Copilot.

Environment

Zed: v0.157.5 (Zed) OS: macOS 14.5.0 Memory: 64 GiB Architecture: aarch64

If applicable, add mockups / screenshots to help explain present your vision of the feature

No response

If applicable, attach your Zed.log file to this issue.

Zed.log


g0t4 commented 1 week ago

TLDR

We might need to modify the prompts to remove {{REWRITTEN_CODE}} and the triple backticks, same for {{INSERTED_CODE}}

Also, I'm not finding code in the zed repo that strips REWRITTEN_CODE nor INSERTED_CODE, those strings are not anywhere except the template, is it possible that is missing from handle_stream or somewhere nearby?

prelim testing

I made some changes in ~/.config/zed/prompt_overrides/content_prompt.hbs and even the small models (i.e. llama3.2:3b) stopped inserting bogus start/stop chars when I nuked {{REWRITTEN_CODE}} and used this last instruction instead:

Immediately start your response with no remarks before nor after, only the rewritten code:

Also, this has worked well too:

Immediately start your response in a single markdown code block (triple backticks) with no remarks before and none after.

Original notes

Likewise with {{REWRITTEN_CODE}}

I have a hunch that most models trip up b/c this is confusing (end of the prompt):

Immediately start with the following format with no remarks:

```
{{INSERTED_CODE}}
```

Here is for rewrite:

Immediately start with the following format with no remarks:

```
{{REWRITTEN_CODE}}
```

Is the model supposed to add the triple backticks? Or just {{INSERTED_CODE}}? Too much confusion IMO!

I've also seen models add {{INSERTED_CODE}} at the end of the response too. And even worse I find the smaller models often add extra } at the very end and start with {{INSERTED_CODE} with only one trailing }.

How about change it to just:

Immediately start with the following format with no remarks:

INSERTED_CODE

OR, come up with something more unique, but drop all the problematic characters and don't leave a hint that markdown might be involved.

Also maybe change to CODE_TO_INSERT because INSERTED_CODE is past tense for a future action.

OR, for the insert case, why do you need it to start with anything?

I also often see models add explanations AFTER the inserted/replaced code... the prompt should clarify not to do that anywhere (not before, nor after, nor during)

Another thing, I often find blank lines are removed before/after a selection. There should be some mechanism to preserve those. Either edge case code, OR, tell the model not to remove leading/trailing blank lines.

g0t4 commented 1 week ago

FYI here is what it looks like when REWRITTEN_CODE is inserted (using llama3.2:3b):

CleanShot 2024-11-08 at 12 43 37@2x

Notice that it includes {{REWRITTEN_CODE} with only one trailing curly brace and then at the bottom it includes double }} that are also not desired.

And, it just dawned on me that the intent of the template is not for the model to start with {{REWRITTEN_CODE}} but rather that is a placeholder... wow was that not at all obvious to me, no wonder the models are confused too!

g0t4 commented 1 week ago

More examples

Prepends {{REWRITTEN_CODE}} CleanShot 2024-11-08 at 12 49 48@2x

Surrounds with {{ and }} CleanShot 2024-11-08 at 12 51 00@2x

g0t4 commented 1 week ago

Here are some insert examples (using llama3.2:1b which is highly prone to being confused):

INSERTED_CODE before & after: CleanShot 2024-11-08 at 13 19 09@2x

INSERTED_CODE before only: CleanShot 2024-11-08 at 13 19 46@2x

g0t4 commented 1 week ago

I opened a PR with changes to the default prompt. I have not spent a ton of time extensively testing it, so I am not married to it. Just wanted to start the conversation there. It does seem to work well with my testing using smaller llama models.