Enhancement: Consider fuzzy matching of the SEARCH pattern.

nevercast commented 11 months ago

Hi!

Aider is an excellent tool, and as agentic software becomes more prevalent, I find that its shortfalls become more and more irritating. In a similar way that every computer is a slow horrible mess once you've used something better.

With Aider, this is that the SEARCH block doesn't seem to match sometimes, and I might have waited minutes for the output of the conversation only to have no changes. I work-around this by doing small, focused increments as much as possible, but there is always a time where the change is just a little bit bigger, or GPT decides that the diff should be lengthy. I can sometimes apply the change myself and then commit it and continue the conversation, but it would be nice if I had to do this less often.

My thought was a fuzzy matching on SEARCH patterns. I can't say that you aren't already doing this because I have not looked at the way Aider works in regard to file updates. But I suspect that the SEARCH pattern must be exact, down to the whitespace.

I'd like to open the discussion for ways Aider could, without harming the accuracy of the updates too much, improve this. Handling whitespace differently comes to mind, as does fuzzy matching (Levenshtein distance?).

Has thought already been put into this? Is this something that is desired?

paul-gauthier commented 11 months ago

Aider has implementations of both of these ideas.

Fuzzy matching for leading whitespace is critical, as GPT-4 is very prone to omitting leading whitespace when it is doing SEARCH/REPLACE on a deeply nested chunk of code. It preserves the indentation relevant to the chunk it is working with, but often leaves out the first few indents that are common to every line in the chunk. Aider recognizes the omitted leading whitespace and restores the correct amount.

Aider also has support for a Levenshtein-style fuzzy match of the entire SEARCH block. In recent releases it has been disabled though. Often GPT produces a SEARCH block that leaves out some semantically significant lines. Those lines are also missing from the REPLACE block. And so the result is that fuzzy matching allows GPT to silently delete some important lines from the code, introducing unintended changes and bugs.

Instead, aider reflects back an error to GPT if the SEARCH block fails to match the existing file contents. It points out exactly where the SEARCH block deviated from the actual file contents. GPT is usually then able to notice the mistake and produce a new SEARCH/REPLACE block which is both a correct match and a semantically correct edit to the file.

nevercast commented 11 months ago

Thanks for answering. Seems like this has been explored and the most reliable solution is already in place. I shall close this issue.

A thought though, and perhaps better in another issue: It would be nice if Aider bailed as soon as the SEARCH pattern fails to match anything when streaming from the agent, could run that in the background after each new line or something. This would also save me tokens :)

paul-gauthier commented 11 months ago

Yup. Notifying GPT in real time about a bad partial SEARCH block is on my list. I almost implemented it last week when I was doing related work on the "diff" coder backend where all this lives.

paul-gauthier / aider

Enhancement: Consider fuzzy matching of the SEARCH pattern. #306