Open nacloos opened 2 years ago
One important issue with the new gym version is that we are sometimes stepping the environment before resetting, which this new version does not like. The reason we are stepping before resetting is that if we don't do it, the observation returned by the reset function is the one from the original task, without any modification done by wrappers. For instance, when using the passreward wrapper, the reset function should append an extra value to the original observation but it does not do so if we don't step.
I made some tests and it's not a problem to call step
in the function reset
of TrialEnv
. Here is a small example:
env = ngym.make('MyEnv')
env: <OrderEnforcing<MyEnv>> # gym.make automatically adds the wrapper OrderEnforcing to the env
# calling env.reset() produces the following chain of calls:
OrderEnforcing.reset()
MyEnv -> TrialEnv.reset() # no problem to call self._top.step() here since OrderEnforcing.reset() has already been called
OrderEnforcing.step()
MyEnv -> TrialEnv.step()
MyEnv._step()
However, there is indeed the error AssertionError: Cannot call env.step() before calling reset()
that is raised with, for example, the following code:
tasks = ngym.get_collection('yang19')
envs = [gym.make(task) for task in tasks]
schedule = RandomSchedule(len(envs))
env = ScheduleEnvs(envs, schedule=schedule, env_input=True)
env.reset()
But the problem is coming from ScheduleEnvs
, not TrialEnv
. Here is a small example using ScheduleEnvs
with two tasks:
env = ScheduleEnvs(envs=[ngym.make('MyEnv1'), ngym.make('MyEnv2')])
env:
<ScheduleEnv
<OrderEnforcing<MyEnv1>>
<OrderEnforcing<MyEnv2>>
>
# calling env.reset() produces the following chain of calls:
ScheduleEnv -> TrialWrapper -> Wrapper.reset() # ScheduleEnv.env refers to the first env of the arg envs
OrderEnforcing<MyEnv1>.reset()
MyEnv1 -> TrialEnv.reset()
ScheduleEnv.step() # changes the current env to MyEnv2
OrderEnforcing<MyEnv2>.step() # error since MyEnv1 is reset but not MyEnv2
One solution would be to change ScheduleEnv
so that self.i_env = self.schedule()
is called at the end of new_trial
instead of the beginning. This would ensure that the same environment is used for reset
and for the first call to step
. The function reset
also has to be overwritten in ScheduleEnv
to make sure that all the environments in the list envs are reset (and not just the first one as it is currently the case).
Sounds good, have you tried it? does it work?
Then the latest version of gym is compatible with the basic usage of NeuroGym?
I modified ScheduleEnvs and I added some tests in c2efa7c329ea0fc8d478c2633b34d446f626e622. It seems to work now.
I think so, it's just that we have to properly reset the environments.
Here are some of the changes made in the latest releases of Gym, which might cause some problems with neurogym v0.0.1:
env.seed
is being deprecated and replaced by an additional argument inenv.reset
(v0.22.0)I created the branch fix-gym-compatibility to fix the compatibility issues due to these changes. We should also keep an eye on this roadmap for Gym 1.0.