myshell-ai / AIlice

AIlice is a fully autonomous, general-purpose AI agent.
MIT License
699 stars 104 forks source link

Agent apparently rewrites entire code file and loses a bunch of work. #44

Open FellowTraveler opened 2 weeks ago

FellowTraveler commented 2 weeks ago

I noticed the agent was working on the code and the file got bigger, and bigger, say 3k, then 6k, then 9k. ...Then at some point the file was 2 or 3k again. Basically the agent just deleted everything it had done and rewrite the file. I think that code files should not be thought of as "files" to the agent. Instead, whether the agent writes the code, or ingests it, the agent should determine how to split up the file (into a series of functions for example) and should have a corresponding test(s) for each function, and should only make changes to that specific function. The entire file if possible (or maybe the header in the case of C++) should still be in the context, but the only part actually being edited should be the 1 specific function, which should be stored in its own place. Then when the agent writes the file to disk, it "compiles" it from those chunks into the final text file that is written. The agent should not have the power to rewrite the entire file at once and re-save it. Otherwise that situation prevents it from systematically working through the problem set, and instead causes it to loop endlessly starting over at the beginning with a rewrite and losing work that it had already done and even debugged. This happened tonight with Claude-Sonnet-3.5 BTW. But I think I've also seen it with GPT-4o, deepseek-coder and/or qwen2 72b instruct. BTW -- you may already be doing this, I saw hints of something like this in the log. But I still saw the problem, so maybe I just need to understand the code better so I can help prevent this sort of issue from happening.

stevenlu137 commented 2 weeks ago

Yes, I’m also working on improving AIlice’s ability to modify and edit code in complex software projects. It often overwrites previous work, and this problem is largely because AI lacks a suitable text editor. If there were a convenient text editor available, the LLM would likely prefer modify existing code than generating a new code file. Currently, there are a few solutions:

The first method is natively supported but has the drawback that the LLM might need to use many escape characters in the text replacement expressions, which it is not very good at, leading to potential errors.

I have already implemented the second method but haven't yet submitted the corresponding changes in the prompt (so you can’t use it yet). This method works in tests but is not ideal yet.

The third method is more complex. Leveraging existing knowledge when providing tools for LLM is an effective technique. Teaching it to do complex things with prompts incurs significant costs and is less effective than using tools it has learned during pre-training. If we had a well-known command-line tool for structured code editing that the LLM is already familiar with, it would be an ideal choice—we would only need to give it a slight prompt. On the other hand, building a completely new tool and teaching it in the prompt could result in an overly complex prompt and potentially poor outcomes. However, if we could create a very simple tool, that would also be feasible.

This is still an ongoing experiment. I can upload the complete code to dev soon, and if you're interested, you can give it a try. I’m also interested in solutions for directly editing functions or classes in the code. Do we have a command-line tool for such code editing functions?

FellowTraveler commented 1 day ago

I am working on ingesting CPP methods, classes, functions. I have untested code also for ingesting Python, Rust, and Javascript. Check out R2R framework it may be just what is needed for ingesting for autocoding. Doc store, graph db (neo4j) and vector DB and postgres with agent RAG pipelines etc. I might add my code ingestion stuff in there. I think it's possible to give the agent a "map" of a project and let it edit individual classes or methods without having to rewrite everything else. As long as it is grounded by unit tests along the way I think the flow from alphacodium will be successful for autocoding. This could be the perfect long-term memory for AIlice. Also, an agent like AIlice needs to be able to learn procedures for doing things. I should be able to tell it procedural instructions the same way I would tell an employee: "Here's how XYZ is done...do it like this..." "First create a new branch, and the tester agent should confirm that new changes are passing all existing unit tests, plus the new unit test, before committing the branch and submitting a PR for review..." There are certain procedures like this where we already know what we want the agents to do. And through experience they codify new procedures and improve existing ones. The process of improving existing procedures is probably the same process that comes up with new procedures in the first place. But I should also be able to give it a short-cut by just telling the bot various effective procedures up front to give it a head start. Then those can evolve over time from there.

stevenlu137 commented 3 hours ago

I am very much looking forward to seeing the progress of this work! I have recently realized the importance of extracting entities based on classes or functions, which is exactly what you are working towards. Once we have a better long-term memory, we can enable AIlice to accumulate experience automatically into long-term memory, just as you mentioned with procedures. This will be a controllable and interpretable learning mechanism, which is very promising!