In short: This is a corpus of interactive programs designed to evaluate a systems ability to construct complex causal models. Instances in this corpus have complex functional and causal relationships.
Comparison to Existing Benchmarks
Causal discovery benchmarks: domains not interactive, lack complex functional relationships
Program synthesis benchmarks: not causal, not interactive.
Overview
Set of n_train training problem and m_test testing problems.
Each problem is represented by some time varying internal state s_t
There is a function f that maps that internal state to some structured observable state o_t
There is a rendering function that maps o_t to an unstructured image.
Queries
The agent interacts with the world by performing imperatives Imperatives are expressions in a language. An example of an imperative might be press the red button.
Open questions:
How should agent interact with the world?
It could be a finite action space, similar to RL agent, and also modelling say keyboard interaction. Alternatively, we could have a more structured action space, like move the object to some position. This would be more interesting. I think there are two different goals. The first is to provide an action space where interesting experiments can be done. If the action space is too granular, then one has to solve a learning problem to find the relationship between the actions and the desired experiment. For instance if I want to put objects on a machine, then moving the object up and down and around is another problem. Of course, that's part of it it, but maybe we can side step it.
The other part is that it would be interesting to see if we can learn useful routines for experiments.
Should the agent observe the state of the world or some pixel space or something else.
If the agent observes just the rendering the interactions can only be operations at that level. Learning and inverting the rendering function is not the point of this project. That said, figuring out internal states and so on might be.
Experiments vs Actions
What are the important distinguishing factors between an experiment and the actions an experimenter can take. For instance, if we want to learn the workings of the blicket machine, an experiment is something like "put the red square on the box". Whether this corresponds to an action or not depends on the action space. Suppose the action space is moving a mouse cursor, and clicking. Then, there is a significant difference between the actions you can take, and the experiment.
In this setting, an experiment seems more like a goal state. It's tempting to suggest it's a set of possible worlds, but then we might lose the distinction between an intervention/action and an observation. For instance, suppose we want to distinguish whether smoking -> cancer or cancer -> smoking. An experiment might be something like force someone to smoke.
Language
Should there be a DSL for these models?
You could imagine a DSL for instance that specified the number of objects in the scene and some dynamics. If you think of a game engine there are some things that are just given, like physics, objects and object groups, rendering.
It could be something as simple as
The world is decomposed into objects
objects have a color and opacity that is a function of their internal state
Their internal state can be a function of anything else, time,
Evaluation
There are a few approaches to consider:
Structural similarity. In Bayesian networks this some motion of graph similarity. It is tenuous there and even more tenuous in the case of programs. There are many syntactically different programs that induce the same causal model
Predictive Accuracy: How well the induced model predicts data
Counterfactual Accuracy: How well the induce model predicts counterfactual / interventional distributions
Design document for CISC Design
In short: This is a corpus of interactive programs designed to evaluate a systems ability to construct complex causal models. Instances in this corpus have complex functional and causal relationships.
Comparison to Existing Benchmarks
Overview
Set of
n_train
training problem andm_test
testing problems.s_t
f
that maps that internal state to some structured observable stateo_t
o_t
to an unstructured image.Queries The agent interacts with the world by performing imperatives Imperatives are expressions in a language. An example of an imperative might be press the red button.
Open questions:
How should agent interact with the world? It could be a finite action space, similar to RL agent, and also modelling say keyboard interaction. Alternatively, we could have a more structured action space, like move the object to some position. This would be more interesting. I think there are two different goals. The first is to provide an action space where interesting experiments can be done. If the action space is too granular, then one has to solve a learning problem to find the relationship between the actions and the desired experiment. For instance if I want to put objects on a machine, then moving the object up and down and around is another problem. Of course, that's part of it it, but maybe we can side step it. The other part is that it would be interesting to see if we can learn useful routines for experiments.
Should the agent observe the state of the world or some pixel space or something else. If the agent observes just the rendering the interactions can only be operations at that level. Learning and inverting the rendering function is not the point of this project. That said, figuring out internal states and so on might be.
Experiments vs Actions
What are the important distinguishing factors between an experiment and the actions an experimenter can take. For instance, if we want to learn the workings of the blicket machine, an experiment is something like "put the red square on the box". Whether this corresponds to an action or not depends on the action space. Suppose the action space is moving a mouse cursor, and clicking. Then, there is a significant difference between the actions you can take, and the experiment.
In this setting, an experiment seems more like a goal state. It's tempting to suggest it's a set of possible worlds, but then we might lose the distinction between an intervention/action and an observation. For instance, suppose we want to distinguish whether smoking -> cancer or cancer -> smoking. An experiment might be something like force someone to smoke.
Language
Should there be a DSL for these models? You could imagine a DSL for instance that specified the number of objects in the scene and some dynamics. If you think of a game engine there are some things that are just given, like physics, objects and object groups, rendering.
It could be something as simple as
Evaluation
There are a few approaches to consider: Structural similarity. In Bayesian networks this some motion of graph similarity. It is tenuous there and even more tenuous in the case of programs. There are many syntactically different programs that induce the same causal model
Predictive Accuracy: How well the induced model predicts data
Counterfactual Accuracy: How well the induce model predicts counterfactual / interventional distributions