Open carlosgmartin opened 1 year ago
Really sorry for taking a year to get back to you on this. I think this is a great suggestion, I don't have time to do this right now (I may have some time early next year) as it would quite a bit of work, but I'll will leave this up for anyone who wants to implement it I'll be happy to review :smile:
Is your feature request related to a problem? Please describe
Currently, one obtains a successor state by calling
Environment.step(state, action)
. Thestate
itself contains akey
, which is derived from thekey
argument ofEnvironment.reset
via splitting and propagation throughout the episode. This lets Jumanji simulate stochastic environments.However, this approach has some disadvantages:
State
as input, it can plan (think AlphaZero) with access to future environment randomness, breaking the assumption that the latter is unpredictable and letting the agent "cheat".Describe the solution you'd like
Allow
Environment.step
to receive akey
argument directly, as inEnvironment.step(state, action, key)
. This is the approach taken by gymnax. It is also the approach pgx intends to take: https://github.com/sotetsuk/pgx/issues/1043.In the medium/long term, I would support deprecating the
State.key
attribute entirely, which is currently the only constraint enforced by theStateProtocol
. Its removal would allowState
objects to be completely generic (they could be strings, ints, tuples, dicts, etc.).Describe alternatives you've considered
A possible alternative is to create a copy of the
state
, replace itskey
attribute, and pass it intoEnvironment.step
(for the first issue) or the agent (for the second issue). However, this approach seems hacky and error-prone.Fundamentally, it seems like
step
should be treated as an intrinsically stochastic function, implying that it should receive its ownkey
at call time. (Thekey
can beNone
if it's not needed.)