Agent apparently rewrites entire code file and loses a bunch of work.

myshell-ai / AIlice

AIlice is a fully autonomous, general-purpose AI agent.

MIT License

868 stars 134 forks source link

Agent apparently rewrites entire code file and loses a bunch of work. #44

Open FellowTraveler opened 4 months ago

FellowTraveler commented 4 months ago

I noticed the agent was working on the code and the file got bigger, and bigger, say 3k, then 6k, then 9k. ...Then at some point the file was 2 or 3k again. Basically the agent just deleted everything it had done and rewrite the file. I think that code files should not be thought of as "files" to the agent. Instead, whether the agent writes the code, or ingests it, the agent should determine how to split up the file (into a series of functions for example) and should have a corresponding test(s) for each function, and should only make changes to that specific function. The entire file if possible (or maybe the header in the case of C++) should still be in the context, but the only part actually being edited should be the 1 specific function, which should be stored in its own place. Then when the agent writes the file to disk, it "compiles" it from those chunks into the final text file that is written. The agent should not have the power to rewrite the entire file at once and re-save it. Otherwise that situation prevents it from systematically working through the problem set, and instead causes it to loop endlessly starting over at the beginning with a rewrite and losing work that it had already done and even debugged. This happened tonight with Claude-Sonnet-3.5 BTW. But I think I've also seen it with GPT-4o, deepseek-coder and/or qwen2 72b instruct. BTW -- you may already be doing this, I saw hints of something like this in the log. But I still saw the problem, so maybe I just need to understand the code better so I can help prevent this sort of issue from happening.

stevenlu137 commented 4 months ago

Yes, I’m also working on improving AIlice’s ability to modify and edit code in complex software projects. It often overwrites previous work, and this problem is largely because AI lacks a suitable text editor. If there were a convenient text editor available, the LLM would likely prefer modify existing code than generating a new code file. Currently, there are a few solutions:

Use BASH functions and text editing commands like sed.
Provide a REPLACE function for text file editing.
Parse the code into functions/classes and allow the LLM to replace the elements that need modification, which is the method you proposed.

The first method is natively supported but has the drawback that the LLM might need to use many escape characters in the text replacement expressions, which it is not very good at, leading to potential errors.

I have already implemented the second method but haven't yet submitted the corresponding changes in the prompt (so you can’t use it yet). This method works in tests but is not ideal yet.

The third method is more complex. Leveraging existing knowledge when providing tools for LLM is an effective technique. Teaching it to do complex things with prompts incurs significant costs and is less effective than using tools it has learned during pre-training. If we had a well-known command-line tool for structured code editing that the LLM is already familiar with, it would be an ideal choice—we would only need to give it a slight prompt. On the other hand, building a completely new tool and teaching it in the prompt could result in an overly complex prompt and potentially poor outcomes. However, if we could create a very simple tool, that would also be feasible.

This is still an ongoing experiment. I can upload the complete code to dev soon, and if you're interested, you can give it a try. I’m also interested in solutions for directly editing functions or classes in the code. Do we have a command-line tool for such code editing functions?

FellowTraveler commented 4 months ago

I am working on ingesting CPP methods, classes, functions. I have untested code also for ingesting Python, Rust, and Javascript. Check out R2R framework it may be just what is needed for ingesting for autocoding. Doc store, graph db (neo4j) and vector DB and postgres with agent RAG pipelines etc. I might add my code ingestion stuff in there. I think it's possible to give the agent a "map" of a project and let it edit individual classes or methods without having to rewrite everything else. As long as it is grounded by unit tests along the way I think the flow from alphacodium will be successful for autocoding. This could be the perfect long-term memory for AIlice. Also, an agent like AIlice needs to be able to learn procedures for doing things. I should be able to tell it procedural instructions the same way I would tell an employee: "Here's how XYZ is done...do it like this..." "First create a new branch, and the tester agent should confirm that new changes are passing all existing unit tests, plus the new unit test, before committing the branch and submitting a PR for review..." There are certain procedures like this where we already know what we want the agents to do. And through experience they codify new procedures and improve existing ones. The process of improving existing procedures is probably the same process that comes up with new procedures in the first place. But I should also be able to give it a short-cut by just telling the bot various effective procedures up front to give it a head start. Then those can evolve over time from there.

stevenlu137 commented 4 months ago

I am very much looking forward to seeing the progress of this work! I have recently realized the importance of extracting entities based on classes or functions, which is exactly what you are working towards. Once we have a better long-term memory, we can enable AIlice to accumulate experience automatically into long-term memory, just as you mentioned with procedures. This will be a controllable and interpretable learning mechanism, which is very promising!

FellowTraveler commented 3 months ago

I have basic ingestion working but I got diverted temporarily. I would rather use an existing solution for this sort of thing than have to write it myself. But if you want to see a working (basic) ingestion of C++ classes/methods/functions see my ngest repo. There's Python, Rust, and Javascript ingestion in there as well but that part is untested: https://github.com/FellowTraveler/ngest

Preferred: I found a great project that runs in docker and has a clean API I think you will really like for RAG and memory purposes. It supports GraphRAG, HyDE, hybrid search, all kinds of ingestion, etc and works great from my testing so far. Probably I will just add my code ingestion to it instead of having to reinvent the wheel. Check it out, it's called R2R:

https://r2r-docs.sciphi.ai/api-reference/introduction

https://github.com/SciPhi-AI/R2R

Also, aider has a repo map and uses tree-sitter to build an abstract syntax tree. Worth checking out when I have the time.

stevenlu137 commented 3 months ago

Last time you mentioned r2r, and I did some research on it. I wrote a long-term memory module to test its performance, but due to recent personal matters, I haven't completed this work yet. I will test your code and r2r and consider addressing the code editing issues. Thank you for the clues and information you provided; they have been very helpful.

Additionally, I found that the current storage module implicitly requires the queried text to be the original text stored. This makes it difficult to apply more advanced memory mechanisms, such as knowledge graphs. Therefore, I will consider doing some refactoring and cleanup work to pave the way for the long-term memory module.

FellowTraveler commented 3 months ago

So far all my code does is a basic ingestion of C++ code into a knowledge graph, with relationships created between the classes and methods. I haven't tested the ingestion for other languages yet (though code is there for python, rust, javascript). I also added PDF ingestion but that's where I start looking at projects like R2R because why reinvent the wheel?

R2R isn't the only one. There are many others and I haven't had time to investigate as much as I'd like. At least with R2R though I have it up and running and it appears to be good stuff. I like how clean the API is, and I like the functionality it provides. I will probably end up putting my C++ ingestion code into R2R but we'll see, I still have more research to do.

Others similar to it that need more investigation are:

https://github.com/infiniflow/ragflow

https://github.com/mem0ai/mem0

https://github.com/D-Star-AI/dsRAG

https://github.com/neuml/rag

https://github.com/neuml/txtai

https://github.com/EdwardDali/erag

https://github.com/iliab1/Neo4j-RAG-Chatbot

https://github.com/jayavibhavnk/GraphRetrieval

https://github.com/kingjulio8238/memary

https://github.com/bradAGI/GraphMemory

https://github.com/AI-Commandos/RAGMeUp

https://github.com/GoodAI/charlie-mnemonic

https://github.com/fiatrete/OpenDAN-Personal-AI-OS

NOTE: some of these are agent projects, but I put them here if they seemed to incorporate memory features that may prove useful. The rest are specific to agentic memory and/or RAG.