rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.86k stars 309 forks source link

Uneven observations dimensions for MAML on PointEnv #2217

Closed kristian-georgiev closed 3 years ago

kristian-georgiev commented 3 years ago

Hi, thanks for the amazing library!

I am trying to use MAMLVPG with PointEnv - I made minimal modifications from the MAML VPG half cheetah dir example, but run into a somewhat bizarre seed-dependent issue: rarely, one of the samples has multiple sets of observations and thus has a different shape.

I suspect that the issue occurs from the interaction of PointEnv with SetTaskSampler but am not certain what exactly causes this behavior.

Below is a minimal example that reproduces this issue.

import torch

from garage import wrap_experiment
from garage.envs import PointEnv
from garage.experiment import MetaEvaluator
from garage.experiment.deterministic import set_seed
from garage.experiment.task_sampler import SetTaskSampler
from garage.sampler import LocalSampler
from garage.torch.algos import MAMLVPG
from garage.torch.policies import GaussianMLPPolicy as GaussianPolicy
from garage.torch.value_functions import GaussianMLPValueFunction
from garage.trainer import Trainer

def set_length(env, _task):
    env.close()
    return PointEnv(max_episode_length=100)

@wrap_experiment
def vpg_maml(ctxt, seed):
    set_seed(seed)
    trainer = Trainer(ctxt)
    env = PointEnv(max_episode_length=100)

    policy = GaussianPolicy(env.spec,
                            hidden_sizes=[10, 10],
                            hidden_nonlinearity=torch.tanh,
                            output_nonlinearity=None)

    value_fn = GaussianMLPValueFunction(env_spec=env.spec,
                                        hidden_sizes=[10, 10],
                                        hidden_nonlinearity=torch.tanh,
                                        output_nonlinearity=None)

    task_sampler = SetTaskSampler(PointEnv, wrapper=set_length)
    test_task_sampler = SetTaskSampler(PointEnv, wrapper=set_length)

    meta_evaluator = MetaEvaluator(test_task_sampler=test_task_sampler,
                                   n_test_tasks=1,
                                   n_test_episodes=2)

    sampler = LocalSampler(agents=policy, envs=env, max_episode_length=100)

    algo = MAMLVPG(env=env,
                   policy=policy,
                   sampler=sampler,
                   task_sampler=task_sampler,
                   value_function=value_fn,
                   meta_batch_size=20,
                   discount=0.99,
                   gae_lambda=1.,
                   inner_lr=0.1,
                   outer_lr=0.001,
                   num_grad_updates=1,
                   meta_evaluator=meta_evaluator)

    trainer.setup(algo, env)
    trainer.train(n_epochs=3, batch_size=32)

if __name__ == '__main__':
    vpg_maml(seed=3)  # pylint: disable=no-value-for-parameter

which on the third epoch produces

Traceback (most recent call last):
  File "minimal.py", line 60, in <module>
    vpg_maml(seed=3)  # pylint: disable=no-value-for-parameter
  File "/home/gridsan/krisgrg/superurop/RL/SG-MRL/garage/experiment/experiment.py", line 369, in __call__
    result = self.function(ctxt, **kwargs)
  File "minimal.py", line 57, in vpg_maml
    trainer.train(n_epochs=3, batch_size=32)
  File "/home/gridsan/krisgrg/superurop/RL/SG-MRL/garage/trainer.py", line 402, in train
    average_return = self._algo.train(self)
  File "/home/gridsan/krisgrg/superurop/RL/SG-MRL/garage/torch/algos/maml.py", line 95, in train
    last_return = self._train_once(trainer, all_samples, all_params)
  File "/home/gridsan/krisgrg/superurop/RL/SG-MRL/garage/torch/algos/maml.py", line 140, in _train_once
    [task_samples[0] for task_samples in all_samples])
  File "/home/gridsan/krisgrg/superurop/RL/SG-MRL/garage/torch/algos/maml.py", line 342, in _compute_policy_entropy
    obs = torch.stack([samples.observations for samples in task_samples])
RuntimeError: stack expects each tensor to be equal size, but got [1, 100, 3] at entry 0 and [2, 100, 3] at entry 3

I am on the main branch (currently up to https://github.com/rlworkgroup/garage/commit/82b5c33ae0796489a00391f80cb94e41657f5962).

Changing the seed changes the epoch/index of the bug.

ryanjulian commented 3 years ago

@kristian-georgiev thanks for the issue and the thorough report!

@krzentner and @avnishn have been working with MAML lately -- perhaps they can take a look?

Here's a very cursory read:

It looks like the error happens when stacking observations from two different tasks. Within the MAML loss function, these should have shape [batch, time, obs dimensions...]. In this case, it looks like one task has a batch size of 1 and the other has a batch size of 2, I think because you set the batch size very small in trainer.train(n_epochs=3, batch_size=32) (batch_size is measured in time steps). This is surprisingly small and could lead to bias towards the overrepresented task, but it's not logically wrong so I think you've encountered a bug which probably wasn't detected by earlier users who use larger batch sizes.

I am not certain (@naeioi , @krzentner , or @avnishn please check), but I think that torch.stack is not the right thing to do here, and we actually want torch.cat, which will join the batch dimension into one big batch. I think that torch.stack worked previously because it's common practice for all tasks to see the same number of trajectories, in which case torch.stack doesn't complain about mismatched sizes in the batch dimension.

kristian-georgiev commented 3 years ago

I confirm that this solves the problem. Thanks for the quick response!

krzentner commented 3 years ago

Great! Thanks for providing a script that demonstrated the issue, it made debugging the problem much easier.