jataware / beaker-kernel

Contextually-aware notebooks with built-in AI assistant
https://jataware.github.io/beaker-kernel/
MIT License
2 stars 2 forks source link

Bump version to v1.5.3 #51

Closed fivegrant closed 5 months ago

fivegrant commented 5 months ago

This PR upgrades to the newest version of Archytas, exposes the OpenAI model as a agentclass variable and cuts a new release of Beaker. The newest version of Archytas also resolves a tool disabling on methods bug.

I created the MODEL class var because I thought we'd want to set the model class-wide although it's an equally valid option to stick expose an arg model in the __init__ statement and default it to gpt-4o.

Pre-merge checklist

fivegrant commented 5 months ago

How I tested this

1 (in askem-beaker downstream)

Setup

I ran

import beaker_kernel, archytas
from importlib.metadata import version
print(version("beaker_kernel"))
print(version("archytas"))

which printed the correct versions:

1.5.3
1.1.7

Mimi Context

I grabbed the first two prompts from the Beaker evaluation:

This one -

Please describe the FUND IAM climate model at a high-school 
senior level, including what it is measuring/simulating, 
how it works, and some thoughts on why and why not someone 
would choose to use it.

and this one -

Please generate the code to load and run the FUND model inside 
a properly setup Julia Jupyter notebook. If there are any unknown 
starting variables, please generate reasonable values. No `Pkg.add` will be needed. 

And the answers returned were both sufficient.

Dataset context

I also wanted to test a simple tool in the dataset so I asked the tool "generate code to print 'hello world'"

This was successful.

2 (in beaker-kernel)

In the 'default' Beaker context, I tried the same test as the Dataset context after setting ENABLE_RUN_CODE=false in the docker compose.

It was successful.

fivegrant commented 5 months ago

After updating to the newest Archytas, I retested (2) above and it worked. Specifically, I toggled the environment variable TOOL_ENABLE_RUN_CODE on/off and it appeared, disappeared from the side panel.

Also, I did a more in depth test with the PyPackage context. I took the following steps:

fivegrant commented 5 months ago

Hmm, I retried the PyPackage test after switching back to turbo and things worked. Seems like GPT-4o is more sensitive to tool descriptions than I initially thought. My tests with the Mimi and Mira contexts were successful though so what I'm thinking is that we switch the default back to turbo for now and switch to GPT-4o on a per context basis. Either way, the code here won't change much because the primary feature of this PR is that the model can be set by the class variable.

Planning on testing all of this with askem-beaker next.

fivegrant commented 5 months ago

All right, this seems to work in some cases with ASKEM Beaker, however, there are a few agents that implement NewBaseAgents that trip up.