Enable agent to execute code

fivegrant commented 5 months ago

This PR provides a run_code tool to any agent using a checkpointable kernel. The specific changes made by this PR:

Agent inherits tools from specific subkernels. This means tools can now be specified for a language as opposed to a context.
The new disable functionality replaces toggeable_tool. _NOTE: This is a breaking change since a few contexts use toggleable_tool in askem-beaker. However, the contexts using this are erroneously reimplementing ask_user anyway._
Add a new run_code tool implemented as a subkernel tool on CheckpointableSubkernel

PRE-MERGE CHECKLIST

[x] Merge the disable functionality into main branch
[x] Cut and publish a new release of archytas
[x] Update pyproject.toml to use the new version

fivegrant commented 5 months ago

Example of the current performance:

fivegrant commented 5 months ago

Note that the example above shows that the prompting could use some improvement. Additionally, it might be nice to send the executed code to the Dev UI

fivegrant commented 5 months ago

@mattprintz @brandomr Here's an explanation of the state of tool toggling in this PR.

With the current code in this PR

As the code currently exists, there are two approaches to disabling the tools in askem-beaker.

The first approach is to include the disabling lines in each context. For example:

class MimiContext(BaseContext):
   TOOL_ENABLE_RUN_CODE = False
   TOOL_ENABLE_ASK_USER = False
   ...

The other approach is to make a new context:

class TerariumContext(BaseContext):
   TOOL_ENABLE_RUN_CODE = False
   TOOL_ENABLE_ASK_USER = False

and have all of the BaseContexts inherit from it:

class MimiContext(TerariumContext):
   ...

With a change to the code in this PR

If our policy is to avoid making any changes in askem-beaker, we can add in defaults specified by environment variables. Our .env would contain lines like

DEFAULT_ENABLE_RUN_CODE=true
DEFAULT_ENABLE_ASK_USER=false

If there is no environment variable specified, it would default to true. As long as askem-beaker changed its environment variables, it wouldn't need to make any code changes to the contexts. A context could still use a tool despite the default like this:

class MimiContext(BaseContext):
   TOOL_ENABLE_ASK_USER = True # overrides the default specified in the environment
   ...

Picking an approach

Any of these solutions will require some kind of code change. The approaches in the With the current code in this PR section expect a change in askem-beaker and With a change to the code in this PR section expects a code change in beaker-kernel.

Personally, I prefer creating the new TerariumContext. I think the first approach I outlined is messy an error prone. The code is very repetitive for behavior we want to generally apply to each context. As for the default environment change, this new toggling mechanism was created out of a desire to avoid environment variables and complicated, non-discoverable disabling behavior.

I think we should encourage developers to use a pattern where they create a class ProjectBaseContext(BaseContext) that all other contexts inherit from. It's very clear whether a tool is enabled. If the child doesn't explicitly enable/disable, look at the parent. If the parent doesn't have it set either, then it's going to be enabled. I think this is an ideal pattern as opposed to using environment defaults. Environment defaults only seem like a good idea if we're trying to accommodate askem-beaker in the short term.

Another reason to make a TerariumContext is we've discussed creating it before for other features (e.g. summarization). We might as well start moving towards growing it out incrementally and this PR seems like a good first step.

Ultimately though I'm fine with any direction we decide to go in

mattprintz commented 5 months ago

a desire to avoid environment variables

Can you expand on this a bit? Is this a desire to avoid all environment variables, or to avoid an explosion of tool/context specific environment variables, etc?

Why is it desirable to avoid environment variables? Particularly since when working with containers, env variables is usually a primary way of setting up configuration options and enabling/disabling behavior in different containers and/or environments (prod, staging, testing, etc)

fivegrant commented 5 months ago

@mattprintz If we want to vary the defaults based on deployment using the current code, we could always just

class TerariumContext(BaseContext):
   TOOL_ENABLE_RUN_CODE = os.environ.get("DEFAULT_ENABLE_RUN_CODE".lower() == "true", False)
   ...

I'm not super against environment variables. It did seem like we were trying to avoid them if possible in previous discussions but I could have misinterpreted.

If we used environment variables, we have something like this

class TerariumContext(BaseContext):
  # Uncomment the line below if you want to override `DEFAULT_ENABLE_RUN_CODE`
  # TOOL_ENABLE_RUN_CODE = True
  ...

My preference is still (1) where you make your own environment default mechanism if you want one although I could be nudged to do something like (2). I'm inclined to keep the core functionality bare bones as possible and non-prescriptive as possible.

mattprintz commented 5 months ago

It did seem like we were trying to avoid them if possible in previous discussions but I could have misinterpreted.

What I was getting at with the earlier comments is that environment variables are server wide, so a single ENABLE_USER_PROMPT environment variable doesn't have enough granularity for what we were looking for. I think I had thrown the idea of out per-context env variables as an option, so we could still use env variables and be granular enough, but that solution does have some flaws as well, so isn't perfect by any stretch.

In short, I never meant to move away from using env variables, just that we shouldn't use a single global env variable that would apply to all contexts.

mattprintz commented 5 months ago

I think we are very close here, I just think we need to clean up some of the semantics. Reading through, I was having trouble figuring out when the tools are available to run and when they aren't, I think I have figured it out now, but I think by renaming things we could make this clearer.

Mostly, I think the confusion is around TOOL_ENABLE_ as a prefix as to me this seems to imply that the tools referenced are disabled by default and by adding that you are enabling them, but in fact it's the opposite. TOOL_ENABLE is a verb, implying that by using it, you are doing something, but TOOL_ENABLE_ASK_USER = True is (usually) essentially a no-op. As the default is that it is already available. Expecting users to understand that disable a tool they should use TOOL_ENABLE_foo = False feels non-intuitive.

There are a couple ways that I see that we can fix this, but I think the most intuitive option is to just switch it from describing an action that should be preformed to describing a state that represents the desired internal state of the context. I.e. to rename from TOOL_ENABLE_ to TOOL_ENABLED_, or something semantically equivalent.

TOOL_ENABLED_ASK_USER = False does not read like an English sentence and is awkward, but the semantic meaning is clearer. TOOL_ASK_USER_ENABLED = False reads better, but has an awkward structure with both a prefix and suffix. Using advanced type hinting we could go with something like: TOOL_ASK_USER = 'enabled' where 'enabled' is defined as typing.Literal choice or even have the values be an enum such as perhaps TOOL_ASK_USER = ToolOverrdes.ENABLED or somethng.

I'm open to your input on this.

fivegrant commented 5 months ago

@mattprintz what do you think of something like this?

>>> from enum import Enum
>>> class ToolToggle(Enum):
...     ENABLE = True
...     DISABLE = False
...     def __bool__(self):
...             return self.value
...
>>> # then inside the beaker-kernel lib we check something like
>>> is_enabled_by_dev = ToolToggle.DISABLE
>>> if not is_enabled_by_dev:
...     print("tool disabled")
...
tool disabled
>>>

Also, do you think there's a better name than ToolOverrides or ToolToggle?

mattprintz commented 5 months ago

@mattprintz what do you think of something like this?> ... Also, do you think there's a better name than ToolOverrides or ToolToggle?

I think the tense is still wrong, and it should be "disabled" and "enabled" as those speak to the state that one want's it to be set which is the appropriate framework for talking about configuration options, which these are. The bool is redundant, but we shouldn't be checking by value anyway, We would want to refer to/compare against the enums always, not expect it to be a boolean option.

Your code is semantically no different than what you had before, really, just more complex.

brandomr commented 5 months ago

Looks good to me. @fivegrant before we merge it can you just add a note somewhere in the docs about how the enabling/disabling works?

fivegrant commented 5 months ago

Sample environment variable changes in darpa-askem are shown here: https://github.com/DARPA-ASKEM/beaker-kernel/tree/feat/upgrade-beaker-kernel-lib. ENABLE_USER_PROMPT needs to be replaced by TOOL_ENABLE_ASK_USER. On the other hand, run_code isn't enabled by what we currently have anyway (because the checkpointing env variable ENABLE_CHECKPOINTS defaults to false) but I went ahead and put the env variables to be explicit

jataware / beaker-kernel