Pass data to agents using a dictionary, or dataclass.
Why
Currently all agents expect (state, action, reward, done, next_state) tuple for stepping. This is likely all that's needed in most cases, however, in some there isn't a need for next_state or done, and in others we might want to provide additional values, e.g. entropy.
What
Pass data to agents using a dictionary, or dataclass.
Why
Currently all agents expect
(state, action, reward, done, next_state)
tuple for stepping. This is likely all that's needed in most cases, however, in some there isn't a need fornext_state
ordone
, and in others we might want to provide additional values, e.g. entropy.