tristandeleu / pytorch-maml-rl

Reinforcement Learning with Model-Agnostic Meta-Learning in Pytorch
MIT License
827 stars 158 forks source link

If I want to use the the meta-parameters to adapt to new task, what should I do? #55

Open GeorgeDUT opened 3 years ago

GeorgeDUT commented 3 years ago

I write a new environment (navigation on deterministic map): (1) I run " python train.py --config xxxx", and get config.json, policy.th. (2) I run "python test.py -config xxxx", and get results.npz. But the rewards in results.npz are still very low. What should I do to use policy.th to fast adapt to a new task?

tristandeleu commented 3 years ago

You should use --policy policy.th in test.py to use your trained policy. That's surprising that you didn't get any error when running test.py without --policy, since this is a required parameter.

GeorgeDUT commented 3 years ago

I get it. I run test.py with policy.th. But the rewards of valid_return are equal to or even lower than train_return. Maybe, our environment is not suitable. Thanks.