MAML performance on 2D navigation

rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.

MIT License

1.88k stars 311 forks source link

MAML performance on 2D navigation #2226

Open kristian-georgiev opened 3 years ago

kristian-georgiev commented 3 years ago

Thank you for the clean and well-documented library! I am trying to use MAML for 2D navigation but have been achieving suboptimal policies. In particular, rollouts from the (adapted) trained algorithm look predominantly something like this.

I am using the current master version (https://github.com/rlworkgroup/garage/commit/43e7b78f4c3a8e977c8140e5e56c8c793f15263c).

This gist contains a minimal example that should reproduce this behavior. I have tried to keep hyperparameters to the defaults, wherever these are given. Additionally, this is the debug log from running the above minimal example.

Apologies for the vague issue - I am not sure whether this behavior is due to hyperparameters or an issue somewhere in the meta-RL pipeline. Thank you in advance!

kristian-georgiev commented 3 years ago

As an additional comment, I have made some small changes to envs/point_env.py:

26  -                goal=np.array((1., 1.), dtype=np.float32),
26  +                goal=np.array((0.3, 0.3), dtype=np.float32),
27  -                arena_size=5.,
27  +                arena_size=0.5,

198 -        goals = np.random.uniform(-2, 2, size=(num_tasks, 2))
198 +        goals = np.random.uniform(-self._arena_size, self._arena_size, size=(num_tasks, 2))

These change the default arena to a unit square and the default support for choosing goals for tasks. I am happy to submit a pull request for that if you believe these changes make sense.

avnishn commented 3 years ago

@kristian-georgiev I think the issue is that the maml in garage uses a Mlp baseline, that the algorithm tries to relearn on every training epoch.

Traditionally MAML fits a linear feature baseline for this variance reduction, and it can fit quite a bit faster to new data than a nn baseline.

If you check out the branch avnish-new-metaworld-results, then you'll find a linear baseline that I've hacked together for torch, and modifications that I've made to MAML to make it work with that linear baseline.

Thanks, Avnish.