saoudrizwan / claude-dev

Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, and more with your permission every step of the way.
https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev
MIT License
3.48k stars 334 forks source link

Code is truncated #14

Open vblues opened 1 month ago

vblues commented 1 month ago

When the code file is long, it will make changes to the file and in places put comments like: // Rest of the code remains the same This essentially renders the code file useless.

ErikMeinders commented 1 month ago

+1

saoudrizwan commented 1 month ago

Apply edits is something I've realized is a pretty hard problem to solve. Currently Claude Dev asks Sonnet 3.5 for the whole file in its response, even if it only has to change one line of code. Not efficient I know, and I planned on optimizing this after the hackathon.

LLMs like Claude have a maximum # of output tokens they can respond with each API request, and so you'll likely see it get "lazy" when writing large amounts of code i.e. with "rest of the code remains the same".

Cursor uses a fine tuned model trained on applying edits from a partial response to a whole file change, along with “speculative decoding” to speed up its responses (instead of inferring the next single token, it infers multiple tokens in parallel). This way it can take a lazy suggestion from a foundational (expensive) model like Sonnet 3.5 and use their lighter ft model to apply those suggestions to a large file much more affordably. (They also use a pretty interesting concept of 'Shadow Workspace' where they essentially use a hidden second instance of vscode to catch and fix any linting errors before applying the edits.)

Aider uses a diff strategy where it asks the LLM to respond with a unified diff format (like how you see with patches), which works much better than other diff strategies considering these models were likely trained on unified diffs at some point. However asking for a whole file results in higher quality responses because 1) their training data likely includes way more whole files than unified diffs, and 2) it is forced to think with more tokens.

I want to build Claude Dev with the assumption that scaling laws will persist and LLMs will get better and cheaper, so I'd prefer not to incorporate hacks that would reduce its quality. Fortunately, Anthropic recently doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in their API, and I will be updating Claude Dev with this change once the hackathon is over. This means way less lazy coding and we can maintain the quality of Claude's edits without compromise.

bzimbelman commented 1 month ago

I agree with your assessment to not go with the unified diff approach aider uses. I get way more diff errors messing up my code with aider than I get code errors occurring with this. One thing I do like with aider is that I can /undo changes and rollback as many changes as I want to. Its use of git for this is helpful as well. I think that kind of ability either utilizing git or some other mechanism would be a good feature for the roadmap.

saoudrizwan commented 1 month ago

v1.0.4 now uses 8192 output tokens, so there should be way less lazy coding. Please feel free to re-open if you continue to see this issue.

saoudrizwan commented 3 weeks ago

Re-opening since similar issues keep getting created, and I'd like to keep anyone interested in updates/troubleshooting to refer to this thread in the future.

The Problem

Language models are limited in how many tokens they can output in a single request. When Claude Dev needs to edit a file, it outputs the entire contents of that file instead of just the chunks that are changed. This is by design since LLMs tend to perform significantly worse when constrained to a specific output format (i.e. "just give me the lines that need to be changed with line numbers" or "replace all unmodified sections with a specifier"). Forcing the model to output the entire contents of the file ensures correctness and higher quality in its changes. Of course this has its drawbacks:

  1. More tokens = higher costs (although this should be mitigated with cheaper models in the future)
  2. When editing large files, if the model runs into max output token limitations it will either truncate unmodified sections (//this code remains unchanged) or outright fail and not provide values for parameters specified as required in the tool schema.

Solutions

I just released v1.1.13 which updates the system prompt to be stricter against lazy coding. You can now also specify custom instructions in settings, which get appended to the system prompt sent with every request and take precedence over prior instructions. I suggest playing around with various prompts here to see if you can get the model to avoid lazy coding–I'd love to hear if someone is able to come up with something that works reliably.

Update: v1.5.6 now let's you edit Claude's changes before accepting! When he edits or creates a file, you can modify his changes directly in the right side of the diff view. You can even hover over a 'Revert Block' arrow button in the center to undo // rest of code here shenanigans when he deletes a bunch of code. This is currently the best solution to this issue, however Anthropic is going to release Fast Edit Mode soon which should make editing files much more reliable (and hopefully cheaper).

CiberNin commented 3 weeks ago

Could we automate this with some blacklisted phrases that automatically reply to claude"provide full updated code" or the like? Also, I've read that adding to the system prompt is sometimes less effective than appending to user prompt.

bigben3333 commented 2 weeks ago

Wouldn't it be simpler to ask Claude to always generate diff files? It would probably cost much less in terms of output tokens.

vtempest commented 1 week ago

Screenshot_20240828_133823

I am having this issue as well. The main reason to use in vscode sidebar is to integrate directly into code with a diff and not have to copy and paste.

  1. But if the file is truncated with "rest of the code remains unchanged" then look for that message and dont go into diff mode, only show those lines as diffed.
  2. go into diff mode but automatically add the truncated parts of files. should be easy to match for the end and starts.
  3. always put into the original prompt: Always print all functions in the file and do not ever print a message that rest of the code remains unchanged to replace functions.
  4. let us customize the sysprompt
monotykamary commented 1 week ago

Wouldn't it be simpler to ask Claude to always generate diff files? It would probably cost much less in terms of output tokens.

You can write a custom instruction to achieve this with unified diffing (YMMV, especially for Ruby like languages):

When editing files, please use the unified diff format to specify your changes. Follow these guidelines:

1. Start each diff with the file name, using '--- a/filename' for the original file and '+++ b/filename' for the modified file.
2. Use the standard unified diff syntax, including line numbers at the start of each hunk. Begin each hunk with @@ -start,count +start,count @@, where:
   - 'start' is the starting line number in the original file
   - 'count' is the number of lines in the hunk (including context lines)
   Use '1' for the count if you're adding new lines at the beginning or end of the file.

3. Provide coherent diffs that show the entire function or logical block of code being modified, including sufficient context.
4. Use '-' to indicate lines being removed, '+' to indicate lines being added, and ' ' (space) for unchanged context lines.
5. Include at least 3 lines of unchanged context before and after the changes to clearly locate the modification in the file.
6. For multiple changes in a file, use separate hunks divided by the @@ ... @@ marker.
7. Ensure that your diffs are complete and don't elide code with comments.
8. When editing a function or block, replace the entire block if possible.
9. To move code within a file, use 2 hunks: 1 to delete it from its current location, 1 to insert it in the new location.
10. Pay close attention to indentation and whitespace. They matter in the diffs!

Here's an example of a correctly formatted hunk:

\```diff
@@ -1,7 +1,9 @@
 defmodule Memo.WatchRun do
+  @moduledoc """
+  A GenServer that watches for file changes in a specified vault path and triggers
+  export scripts when changes are detected.
+  """
+
   use GenServer
   require Logger

-  # delay in milliseconds
-  @unlock_delay 2000
+  @unlock_delay Application.get_env(:memo, :unlock_delay, 2000)
\```

After providing the diff, explicitly state the commands to write the diff to a file in the .claude.dev folder and apply the changes:

Run the `write_to_file` tool for the diff file:

\```
write_to_file .claude.dev/watch_run.patch "
--- a/lib/memo/watch_run.ex
+++ b/lib/memo/watch_run.ex
@@ -1,7 +1,9 @@
 defmodule Memo.WatchRun do
+  @moduledoc """
+  A GenServer that watches for file changes in a specified vault path and triggers
+  export scripts when changes are detected.
+  """
+
   use GenServer
   require Logger

-  # delay in milliseconds
-  @unlock_delay 2000
+  @unlock_delay Application.get_env(:memo, :unlock_delay, 2000)
"
\```

Then run the following command with `execute_command`:

\```bash
patch -p1 < .claude.dev/watch_run.patch
\```

Ensure that the patch applies cleanly against the current contents of the file. Double-check line numbers, indentation, and whitespace.

Some parts are taken from aider's unified prompts. YMMV - 700+ LOC will output malformed patches, so feel free to tune the prompt or use it as a point to help Claude refer back to. It does work 100% of the time for small diffs.

For these cases, it is better to rely on aider, so I'm hoping for a unified_diff_write tool that we can deprecate once limits on output tokens disappear.

vtempest commented 6 days ago

https://github.com/saoudrizwan/claude-dev/releases/tag/v1.5.6 Does this attempt to fix the issue?

saoudrizwan commented 5 days ago

@vtempest yes! pls see the update at the bottom of this comment.

DrugsAreMyLife commented 3 days ago

Rather than asking Claude to provide me with the entire contents of the file back or just the parts of the code that changed, I'll specifically ask for the entire function to be returned for any modified functions. This helped substantially with token usage and my ability to quickly know where code is supposed to go when I'm doing it manually. I feel like it's one zoom level out from "just the code that's changed".

joegiglio commented 1 day ago

How is everyone feeling about the published fix? Are we back in business? Any gotchas?

I had moved over the Cursor but that has its own set of problems! :)