Closed fivegrant closed 5 months ago
I ran
import beaker_kernel, archytas
from importlib.metadata import version
print(version("beaker_kernel"))
print(version("archytas"))
which printed the correct versions:
1.5.3
1.1.7
I grabbed the first two prompts from the Beaker evaluation:
This one -
Please describe the FUND IAM climate model at a high-school
senior level, including what it is measuring/simulating,
how it works, and some thoughts on why and why not someone
would choose to use it.
and this one -
Please generate the code to load and run the FUND model inside
a properly setup Julia Jupyter notebook. If there are any unknown
starting variables, please generate reasonable values. No `Pkg.add` will be needed.
And the answers returned were both sufficient.
I also wanted to test a simple tool in the dataset so I asked the tool "generate code to print 'hello world'"
This was successful.
In the 'default' Beaker context, I tried the same test as the Dataset context after setting ENABLE_RUN_CODE=false
in the docker compose.
It was successful.
After updating to the newest Archytas, I retested (2) above and it worked. Specifically, I toggled the environment variable TOOL_ENABLE_RUN_CODE
on/off and it appeared, disappeared from the side panel.
Also, I did a more in depth test with the PyPackage context. I took the following steps:
PyPackageContext
I set TOOL_ENABLED_GET_INFO_ON_VARIABLE=False
docker-compose.yaml
I set TOOL_ENABLED_GET_DOCUMENTATION=false
This successfully disables the tools in the sidebar. I also did some other operations.
I ran import os
followed by an unsuccessful answer to the query 'Tell me the package structure of os
'. It kept leaving out the 'tool' from the action JSON. I'm assuming this might be a regression by GPT-4o but I'm not sure. get_variables_in_scope
worked successfully at least which makes me think it's an issue with the model itself. Weird because my previous test did not have this problem. Looking into this moreHmm, I retried the PyPackage test after switching back to turbo and things worked. Seems like GPT-4o is more sensitive to tool descriptions than I initially thought. My tests with the Mimi and Mira contexts were successful though so what I'm thinking is that we switch the default back to turbo for now and switch to GPT-4o on a per context basis. Either way, the code here won't change much because the primary feature of this PR is that the model can be set by the class variable.
Planning on testing all of this with askem-beaker next.
All right, this seems to work in some cases with ASKEM Beaker, however, there are a few agents that implement NewBaseAgents that trip up.
This PR upgrades to the newest version of Archytas, exposes the OpenAI model as a agentclass variable and cuts a new release of Beaker. The newest version of Archytas also resolves a tool disabling on methods bug.
I created the
MODEL
class var because I thought we'd want to set the model class-wide although it's an equally valid option to stick expose an argmodel
in the__init__
statement and default it togpt-4o
.Pre-merge checklist
archytas
dependency in Beaker'spyproject.toml