"L2 Agent" - Githubissues

0x4007 commented 3 weeks ago

I was reading my friends blog post and was inspired to think about AI systems in a more structured way. They have these AI "level" designations.

It would be interesting to make an L2 agent according to the definition in the blog post:

L2 agents use LLMs selectively to decide how to handle key points in the program’s control flow. Today, this often boils down to deciding which tool to invoke based on a set of tools which have been carefully curated by a human programmer. The most common example of L2 agents today is invoking an LLM with access to tools in a while loop. The majority of the program’s control flow still resides outside of the LLM’s purview and is controlled by a human programmer.

This is a stepping stone to L3 according to the blog because L3 coordinates L2 and below.

We can make this a command interface where we can tag the bot and ask for requests in plain language:

@ubiquity-os give me the wallet address of @0x4007

In the above example, we should pass the entire help menu to ChatGPT and it can invoke the correct plugin based on the command description.

I think this should be quite straightforward to implement, and is a useful stepping stone towards a more advanced AI powered system.

We can use ChatGPT 4o mini because this seems pretty simple to just look at the help menu.

Advanced Version

As a more advanced version of this plugin, we can listen for every comment (no bot tag required) and the bot can jump in to help if it thinks it can based on any comment. For example, if somebody asks to be assigned to a task, perhaps the bot can somehow invoke /start on behalf of that user (which inherits all of the checks, like if they are already assigned to too many other open tasks etc)

This makes the bot's presence much more pronounced, and it will truly feel like a helpful, and proactive member of the team instead of "a tool" that must be specifically called upon for help.

Remark

I suppose if it calls other plugins with LLMs (like conversation rewards, somehow) then technically this would be considered an L3 class system.

0x4007 commented 3 weeks ago

Seems that L4 requires the bot to write a custom plugin at runtime to handle a novel task.

This could be really interesting (and feasible) with automated CI checking.

I feel like this might be quite slow to run CI on every commit, but would be incredible to see it self build, save, and install a new plugin for future runs, which the L2 described in this specification would be able to invoke in the future.

If we could pull off L4 I'm sure we could go viral/trend in programmer news. Most of the infrastructure is in place, but making robust CI end-to-end tests seems like a month long project.

Keyrxng commented 3 weeks ago

I previously experimented with building a L2 agent using V1

This could be really interesting (and feasible) with automated CI checking.

I agree both v interesting and definitely feasible.

A "simple" V1 is possible if we map safe commands and safe direct actions that'll fire the intended plugins.

Concerns and Questions for V2: Does V2 involve this plugin posting a slash command to GitHub issues, or does it operate by dispatching plugins directly? If it uses slash commands, there's a limitation because plugins will identify the bot as the sender not the actual user, which will break most, if not all plugins.

Implementation Strategy: To include non-slash command capabilities we'd need to:

Provide the LLM with the manifest of each installed plugin
"Teach" the bot both our API and relevant parts of the GitHub API.
Use GitHub's workflow and repo dispatch when feasible; for other cases, enable the bot to build and execute API calls.

Operational Flow:

User queries are sent to OpenAI.
OpenAI determines if the response should trigger a function call or a simple text reply.
If a function is triggered, the arguments are sent to our tool handler.
After execution, responses are either posted directly to GitHub or returned to LLM for further processing.
The loop ends with the addCommentToIssue tool, which posts results back to GitHub. (or this would be after the LLM interaction has ended and we invoke it manually not as part of the LLM loop)

Challenges:

Automating the invocation of any installed plugin directly is complex as it requires detailed knowledge of all plugins and the ability to generate specific payloads for each.
Most slash commands posted by the bot will fail unless "safe"
Using the chat-api we'll "teach" via the tools we write for it. Streamlined, not as smart.
Using the assistants-api we could load it with entire API spec docs. Less-streamlined, far smarter.

Potential Development Paths:

V1 Safe Mode: Allow only pre-approved slash commands; convert all other commands into informative comments.
V1 Direct Action: Enable direct actions on issues (e.g., adding/removing assignees/labels) using parameterized API calls constructed by the LLM. As seen in the old QA I linked above. These actions would cause non-slash commands to fire such as assistive-pricing & task-xp-guard.
V1 Advanced Dispatch: Utilize workflow/repository dispatch which might be very tricky. Calling the kernel directly is trickier still because of the handshake verification etc but could probably be done.

0x4007 commented 3 weeks ago

Intuitively I believe that providing all the context and doing direct invocations (not writing the slash command) seems like the best approach. However, this can get expensive because it would require the larger model, and we would be using a lot of context.

We probably would need to rely on tagging the bot if this is the case which is not as interesting.

I was under the impression that we have standardized payload interfaces for all of the plugins, and that we just need to understand the help menu of each plugin.

ubiquibot / plugins-wishlist

"L2 Agent" #37

Advanced Version

Remark