neurogym / neurogym

A curated collection of neuroscience tasks with a common interface.
MIT License
252 stars 50 forks source link

Compatibility with the latest versions of Gym #37

Open nacloos opened 2 years ago

nacloos commented 2 years ago

Here are some of the changes made in the latest releases of Gym, which might cause some problems with neurogym v0.0.1:

I created the branch fix-gym-compatibility to fix the compatibility issues due to these changes. We should also keep an eye on this roadmap for Gym 1.0.

manuelmolano commented 2 years ago

One important issue with the new gym version is that we are sometimes stepping the environment before resetting, which this new version does not like. The reason we are stepping before resetting is that if we don't do it, the observation returned by the reset function is the one from the original task, without any modification done by wrappers. For instance, when using the passreward wrapper, the reset function should append an extra value to the original observation but it does not do so if we don't step.

nacloos commented 2 years ago

I made some tests and it's not a problem to call step in the function reset of TrialEnv. Here is a small example:

env = ngym.make('MyEnv')
env: <OrderEnforcing<MyEnv>>  # gym.make automatically adds the wrapper OrderEnforcing to the env

# calling env.reset() produces the following chain of calls:
OrderEnforcing.reset()
MyEnv -> TrialEnv.reset()  # no problem to call self._top.step() here since OrderEnforcing.reset() has already been called
OrderEnforcing.step()
MyEnv -> TrialEnv.step()
MyEnv._step()

However, there is indeed the error AssertionError: Cannot call env.step() before calling reset() that is raised with, for example, the following code:

tasks = ngym.get_collection('yang19')
envs = [gym.make(task) for task in tasks]
schedule = RandomSchedule(len(envs))
env = ScheduleEnvs(envs, schedule=schedule, env_input=True)
env.reset()

But the problem is coming from ScheduleEnvs, not TrialEnv. Here is a small example using ScheduleEnvs with two tasks:

env = ScheduleEnvs(envs=[ngym.make('MyEnv1'), ngym.make('MyEnv2')])

env:
<ScheduleEnv
    <OrderEnforcing<MyEnv1>>
    <OrderEnforcing<MyEnv2>>
>

# calling env.reset() produces the following chain of calls:
ScheduleEnv -> TrialWrapper -> Wrapper.reset()  # ScheduleEnv.env refers to the first env of the arg envs
OrderEnforcing<MyEnv1>.reset()
MyEnv1 -> TrialEnv.reset()
ScheduleEnv.step()  # changes the current env to MyEnv2
OrderEnforcing<MyEnv2>.step()  # error since MyEnv1 is reset but not MyEnv2

One solution would be to change ScheduleEnv so that self.i_env = self.schedule() is called at the end of new_trial instead of the beginning. This would ensure that the same environment is used for reset and for the first call to step. The function reset also has to be overwritten in ScheduleEnv to make sure that all the environments in the list envs are reset (and not just the first one as it is currently the case).

manuelmolano commented 2 years ago

Sounds good, have you tried it? does it work?

Then the latest version of gym is compatible with the basic usage of NeuroGym?

nacloos commented 2 years ago

I modified ScheduleEnvs and I added some tests in c2efa7c329ea0fc8d478c2633b34d446f626e622. It seems to work now.

I think so, it's just that we have to properly reset the environments.