hr0nix / omega

A number of agents (PPO, MuZero) with a Perceiver-based NN architecture that can be trained to achieve goals in nethack/minihack environments.
GNU General Public License v3.0
38 stars 4 forks source link

Support for recurrent memory in MuZero #5

Closed hr0nix closed 2 years ago

hr0nix commented 2 years ago

This PR adds support for stateful agents. Here is how it works:

Here's how memory is supported in MuZero:

Other notable changes introduced while working on memory: