This is the implementation of Neurally gUided Differentiable loGic policiEs (NUDGE), a framework for logic RL agents based on differentiable forward reasoning with first-order logic (FOL).
pip install -r requirements.txt
python train.py
to start a new training run.To train a new agent, run python train.py
. The training process is controlled by the hyperparameters specified in in/config/default.yaml
. You can specify a different configuration by providing the corresponding YAML file path as an argument, e.g., python train.py -c in/config/my_config.yaml
. The -c argument is optional and defaults to in/config/default.yaml
.
You can also overwrite the game environment by providing the -g
argument, e.g., python train.py -g freeway
.
The hyperparameters are configured inside in/config/default.yaml
which is loaded as default. You can specify a different configuration by providing the corresponding YAML file path as an argument, e.g., python train.py in/config/my_config.yaml
. A description of all hyperparameters can be found in train.py
.
Inside in/envs/[env_name]/logic/[ruleset_name]/
, you find the logic rules that are used as a starting point for training. You can change them or create new rule sets. The ruleset to use is specified with the hyperparam rules
.
If you want to use NUDGE within other projects, you can install NUDGE locally as follows:
nsfr/
run
python setup.py develop
nudge/
run
python setup.py develop
In case you want to use the Threefish or the Loot environment, you also need to install QT-5 via
apt-get install qt5-default
python3 play_gui.py -g seaquest
getout
contains key, door and one enemy. getoutplus
has one more enemy.
threefish
contains one bigger fish and one smaller fish.threefishcolor
contains one red fish and one green fish. agent need to avoid red fish and eat green fish.
loot
contains 2 pairs of key and door. lootcolor
contains 2 pairs of key and door with different color than in loot. lootplus
contains 3 pairs of key and door.You add a new environment inside in/envs/[new_env_name]/
. There, you need to define a NudgeEnv
class that wraps the original environment in order to do
closeby
) has a corresponding valuation function which maps the (logic) game state to a probability that the relation is true. Each valuation function is defined as a simple Python function. The function's name must match the name of the corresponding relation.See the freeway
env to see how it is done.
TODO Using Beam Search to find a set of rules
With scoring:
python3 beam_search.py -m getout -r getout_root -t 3 -n 8 --scoring True -d getout.json
Without scoring:
python3 beam_search.py -m threefish -r threefishm_root -t 3 -n 8