omega

This repo contains and implementation of an agent that can learn to maximise reward in environments with NetHack interface such as nle or MiniHack.

Crossing a river Fighting monsters in a narrow corridor

Repo highlights

A Perceiver-inspired encoder of NetHack states.
An implementation of a PPO-based RL agent
- Advantage is estimated using GAE
- Per-batch advantage normalization and entropy-based policy regularization are supported.
- This agent was meant mainly as a baseline, most of the effort in this repo went into MuZero.
An implementation of MuZero-based RL agent.
- MCTS runs on GPU and is pretty fast.
- Reanalyze is supported.
- Recurrent memory is supported.
- State consistency loss inspired by Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision is supported.
- Ideas from Stochastic MuZero are implemented, so the agent runs correctly in stochastic environments.
- A search policy from Monte-Carlo tree search as regularized policy optimization can be enabled to improve efficiency of MCTS, which can be very helpful when simulation budget is small or branching factor is very large.
Training and inference is implemented in JAX, with the help of rlax and optax
Models are implemented in JAX/Flax

Clone the repository:

git clone https://github.com/hr0nix/omega.git

Create a new experiment based on one of the provided configs:

python3.8 ./tools/experiment_manager.py make --config ./configs/muzero/random_room_5x5.yaml --output-dir ./experiments/muzero_random_room_5x5

Run the newly created experiment. You can optionally track the experiment using wandb (you will be asked if you want to, definitely recommended).
```
python3.8 ./tools/experiment_manager.py run --dir ./experiments/muzero_random_room_5x5 --gpu 0
```

After some episodes are completed, you can visualize them:

python3.8 ./tools/experiment_manager.py play --file ./experiments/muzero_random_room_5x5/episodes/<EPISODE_FILENAME_HERE>