myshell-ai / AIlice

AIlice is a fully autonomous, general-purpose AI agent.
MIT License
825 stars 129 forks source link

Agent use git? #48

Open FellowTraveler opened 3 months ago

FellowTraveler commented 3 months ago

Suggest feature from Aider: whenever agent makes a change, create a new git branch, make the change there on that branch, debug there, until it's ready to merge back to dev branch. Furthermore, probably should make the test first on a new branch (against a stub function), merge it, and then write the function which must pass that test. Then merge that. Then write next test, etc. Seems like the agent can do C++ this way. Write the header first, including a comment at the top describing what the class is meant to do, including the interface if available/specified. An LLM summary of the header can be made and then a graph node can be inserted with properties for text of the header itself, the summary of the header, and the embedding of the summary, along with the fully-qualified scope and name of the class, and the filename. And I suppose the commit hash. But what do you think? Then each function should be ingested/written as its own separate piece of text, on its own graph node, with properties for dependencies. And whenever a function is changed, the agent should write the entire CPP file out on the fly, with all the other functions unchanged (stored separately before that point). All the member functions for that class should still be viewable to the coding agent, but the agent should only be actually changing one of those functions at a time IMO. Or at least, a small set of them discretely and with purpose. (Git can be useful for this). Each function should have its own summary created by an LLM with the header in context, and each summary should then have an embedding used for finding that function later (and its subgraph). Semantic search is not as useful for code, but summaries of code I think can be semantically useful. As well as fully-qualified names / scope.

But of course for context the agent doesn't just need to know the interface/implementation of the class it's working on, but also it needs to know the interfaces it has inherited from base classes. It needs to read the summaries from base class headers and methods, so it knows whether or not it's supposed to override certain methods, or use certain methods from higher up in the class hierarchy. This is why I think a graph is critical as you have pointed out also. Each time a change in a git branch passes all the existing tests, and institutes a new test that passes critical inspection, and contains a new function that passes the new test as well, then a caller agent can consider merging it back into dev branch.

^^^^Partially I wrote this only to get my own thoughts out. This should be general enough that it works for all languages, I'm only using C++ to "think my way" through it. ^^^^Partially I wrote this because I think you are doing some of this already, in terms of tracking the functions and file pieces separately, and I want to understand faster how I can help, or where I'm wrong, or what I'm missing, and what I need to contribute to get this part working because I think this is critical. Again, much respect for this project; I think you have the right design philosophy. I have played with a lot of agents and I really like this one, so thank you for your hard work. It is appreciated.

stevenlu137 commented 3 months ago

The idea here can be divided into two parts: one is to prompt the agent to use git for version control, and the other is the code editor we discussed in another issue. There are many ways to achieve this ideal. One way is to integrate git functionality directly into this editor, providing underlying support for version rollback, comparison, or merging features. Another way is to consider git as a workflow specifically aimed at software development, and to design a new agent for this purpose (modifying coder-proxy is not very suitable as it is already quite complex and not entirely in the role of a complex software developer; it often executes simple commands and code snippets to complete actions rather than performing software engineering tasks).

In the long run, the development of LLMs will inevitably reduce the importance of providing them with more complex and powerful tools, as theoretically, an intelligent enough LLM with bash execution permissions alone can complete various complex tasks. This is one of the reasons why AIlice has always adopted a very rudimentary code editing method. Therefore, I am more inclined to treat git as a workflow rather than encapsulating it in the editor. The advantage is that the LLM can directly utilize all its knowledge about git and execute git commands through the BASH function to complete tasks.

Building an editor capable of directly manipulating functions or classes is very practical. My personal understanding is that it is similar to providing a Visual Assist for AI, allowing it to modify and refactor code more conveniently (of course, through code function description text, you can also quickly understand the entire project's functionality).

Although we mentioned the importance of dynamically constructing prompts in another issue, I would like to add another understanding. Constructing dynamic prompts in the tool modules is like turning this tool into a state machine. In each state, it appends a set of functions and their descriptions to the text returned to the LLM. The LLM calling these functions can transition the state machine to a new state and provide a new list of functions. This way, we construct an effective interaction between the module and the LLM, realizing the ideal of providing more complex tools to the LLM. It can be seen that the design of this state machine determines whether the LLM can smoothly and freely use the tools and is the key to the entire work. I hope this can help you.

Regarding my current work: In recent days, I have been trying to enable AIlice to move beyond writing small programs and start modifying complex programs, eventually achieving complex software engineering. The methods are limited to providing a text editor and adjusting prompts (for example, trying to remind the LLM to generate code modification plans rather than a complete new version of the code). In the future, I will also attempt complex software engineering, likely using an iterative workflow. As the iterations progress and software complexity increases, a key challenge will be how to grow the agent calling tree and adapt to the new complexity.

On the other hand, in recent tests involving more complex tasks, I have already felt the limitations due to the lack of an excellent long-term memory mechanism. I am considering a new long-term memory mechanism, generally constructing a network structure of knowledge with the LLM to achieve more types of semantic associations (vector databases can only provide semantic similarity). This work is almost independent of AIlice because the long-term memory module defines very simple interfaces (STORE/QUERY). In most cases, agents do not consciously recall certain information (it's subconscious), so they do not need any knowledge of operating long-term memory themselves, allowing them to better focus on the problem at hand. This is similar to how the human brain usually works.

stevenlu137 commented 3 months ago

And thank you.

Although this project has been online for nine months, it is not widely known (perhaps because the name seems too quirky to native English speakers?). There aren't many developers who appreciate its design philosophy and get involved, so I'm very happy to see another serious developer interested in this project! This is the early stage of community building, and there are many great things for us to create.