hr0nix / omega

A number of agents (PPO, MuZero) with a Perceiver-based NN architecture that can be trained to achieve goals in nethack/minihack environments.
GNU General Public License v3.0
37 stars 4 forks source link
flax jax mcts minihack model-based-reinforcement-learning model-based-rl muzero nethack reinforcement-learning rlax

omega

This repo contains and implementation of an agent that can learn to maximise reward in environments with NetHack interface such as nle or MiniHack.

Crossing a river Fighting monsters in a narrow corridor

Repo highlights

How to train an agent

  1. Clone the repository:
    git clone https://github.com/hr0nix/omega.git
  2. Run the docker container:
    bash ./omega/docker/run_container.sh
  3. Create a new experiment based on one of the provided configs:
    python3.8 ./tools/experiment_manager.py make --config ./configs/muzero/random_room_5x5.yaml --output-dir ./experiments/muzero_random_room_5x5
  4. Run the newly created experiment. You can optionally track the experiment using wandb (you will be asked if you want to, definitely recommended).
    python3.8 ./tools/experiment_manager.py run --dir ./experiments/muzero_random_room_5x5 --gpu 0
  5. After some episodes are completed, you can visualize them:
    python3.8 ./tools/experiment_manager.py play --file ./experiments/muzero_random_room_5x5/episodes/<EPISODE_FILENAME_HERE>