Open kristian-georgiev opened 3 years ago
As an additional comment, I have made some small changes to envs/point_env.py
:
26 - goal=np.array((1., 1.), dtype=np.float32),
26 + goal=np.array((0.3, 0.3), dtype=np.float32),
27 - arena_size=5.,
27 + arena_size=0.5,
198 - goals = np.random.uniform(-2, 2, size=(num_tasks, 2))
198 + goals = np.random.uniform(-self._arena_size, self._arena_size, size=(num_tasks, 2))
These change the default arena to a unit square and the default support for choosing goals for tasks. I am happy to submit a pull request for that if you believe these changes make sense.
@kristian-georgiev I think the issue is that the maml in garage uses a Mlp baseline, that the algorithm tries to relearn on every training epoch.
Traditionally MAML fits a linear feature baseline for this variance reduction, and it can fit quite a bit faster to new data than a nn baseline.
If you check out the branch avnish-new-metaworld-results, then you'll find a linear baseline that I've hacked together for torch, and modifications that I've made to MAML to make it work with that linear baseline.
Thanks, Avnish.
Thank you for the clean and well-documented library! I am trying to use MAML for 2D navigation but have been achieving suboptimal policies. In particular, rollouts from the (adapted) trained algorithm look predominantly something like this.
I am using the current master version (https://github.com/rlworkgroup/garage/commit/43e7b78f4c3a8e977c8140e5e56c8c793f15263c).
This gist contains a minimal example that should reproduce this behavior. I have tried to keep hyperparameters to the defaults, wherever these are given. Additionally, this is the debug log from running the above minimal example.
Apologies for the vague issue - I am not sure whether this behavior is due to hyperparameters or an issue somewhere in the meta-RL pipeline. Thank you in advance!