Write a post before EAG London

jbloomAus commented 1 year ago

I have some drafts, but want to make sure I get something out before EAG London / preferably no later than tomorrow night (May 16th).

Key things

[x] Work out framing/emphasis
[ ] Write up analysis
[ ] Ensure app/repo is fairly accessible
[ ] Get feedback (if done soon enough)
[ ] Post.

Current draft

Resources:

jbloomAus commented 1 year ago

Ideas for framing/emphasis (framed as a post title?)

Understanding a Parameterizable Agent via live Mechanistic Interpretability
Toy Models of Agent Simulation
?

Trying to understand meaningful questions in

Short term:

interpretability for new types of agents

Medium:

what does a goal look like
does it model which agent

jbloomAus commented 1 year ago

Discussion with Jay led to deciding to go very simple and direct.

First pass at intro: Decision transformers are analogous to large language models but it's easier to ask questions about their goals than it is to ask questions about the goals of large language models. I've built a system to train these sorts of models and try to interpret them. Training these agents presents a number of challenges which we have partially overcome in order to produce the model which we analyse below, but we improve our ability to train interesting models in the future. Our analysis makes use of many previously published techniques and "live analysis". We uncover a number of interesting behaviors which we attempt to understand. We are particularly excited about the possibility of further studying goal representations as well agent-simulation.

jbloomAus / DecisionTransformerInterpretability

Write a post before EAG London #74