fivegrant commented 5 months ago

This PR upgrades to the newest version of Archytas, exposes the OpenAI model as a agentclass variable and cuts a new release of Beaker. The newest version of Archytas also resolves a tool disabling on methods bug.

I created the MODEL class var because I thought we'd want to set the model class-wide although it's an equally valid option to stick expose an arg model in the __init__ statement and default it to gpt-4o.

Pre-merge checklist

[x] Merge in newest Archytas version PR
[x] Update archytas dependency in Beaker's pyproject.toml

fivegrant commented 5 months ago

How I tested this

1 (in askem-beaker downstream)

Setup

I ran

import beaker_kernel, archytas
from importlib.metadata import version
print(version("beaker_kernel"))
print(version("archytas"))

which printed the correct versions:

1.5.3
1.1.7

Mimi Context

I grabbed the first two prompts from the Beaker evaluation:

This one -

Please describe the FUND IAM climate model at a high-school 
senior level, including what it is measuring/simulating, 
how it works, and some thoughts on why and why not someone 
would choose to use it.

and this one -

Please generate the code to load and run the FUND model inside 
a properly setup Julia Jupyter notebook. If there are any unknown 
starting variables, please generate reasonable values. No `Pkg.add` will be needed.

And the answers returned were both sufficient.

Dataset context

I also wanted to test a simple tool in the dataset so I asked the tool "generate code to print 'hello world'"

This was successful.

2 (in beaker-kernel)

In the 'default' Beaker context, I tried the same test as the Dataset context after setting ENABLE_RUN_CODE=false in the docker compose.

It was successful.

fivegrant commented 5 months ago

After updating to the newest Archytas, I retested (2) above and it worked. Specifically, I toggled the environment variable TOOL_ENABLE_RUN_CODE on/off and it appeared, disappeared from the side panel.

Also, I did a more in depth test with the PyPackage context. I took the following steps:

On PyPackageContext I set TOOL_ENABLED_GET_INFO_ON_VARIABLE=False
In docker-compose.yaml I set TOOL_ENABLED_GET_DOCUMENTATION=false This successfully disables the tools in the sidebar. I also did some other operations. I ran import os followed by an unsuccessful answer to the query 'Tell me the package structure of os '. It kept leaving out the 'tool' from the action JSON. I'm assuming this might be a regression by GPT-4o but I'm not sure. get_variables_in_scope worked successfully at least which makes me think it's an issue with the model itself. Weird because my previous test did not have this problem. Looking into this more

fivegrant commented 5 months ago

Hmm, I retried the PyPackage test after switching back to turbo and things worked. Seems like GPT-4o is more sensitive to tool descriptions than I initially thought. My tests with the Mimi and Mira contexts were successful though so what I'm thinking is that we switch the default back to turbo for now and switch to GPT-4o on a per context basis. Either way, the code here won't change much because the primary feature of this PR is that the model can be set by the class variable.

Planning on testing all of this with askem-beaker next.

fivegrant commented 5 months ago

All right, this seems to work in some cases with ASKEM Beaker, however, there are a few agents that implement NewBaseAgents that trip up.

jataware / beaker-kernel

Bump version to v1.5.3 #51

Pre-merge checklist

How I tested this

1 (in askem-beaker downstream)

Setup

Mimi Context

Dataset context

2 (in beaker-kernel)