swami1995 / exp_maml

Code for "Learning Exploration Strategies for Model Agnostic Meta-Reinforcement Learning", ICML AMTL workshop
MIT License
1 stars 1 forks source link

Can you provide detailed identifications of these branches? #1

Open NagisaZj opened 4 years ago

NagisaZj commented 4 years ago

Hello there. I am trying to reproduce your experiments with the self-supervision losses. However, I am at a complete loss at which branch I should turn to. Would you kindly provide instructions for what experiment each branch is responsible for?

Edit: I ran main.py in branch another_sparse_branch_ppo with the command you provided: python main.py --env-name HalfCheetahRandVelEnv-v1 --fast-batch-size 20 --meta-batch-size 40 --output-folder hcv-1 --num-workers 16 --embed-size 32 --exp-lr 7e-4 --baseline-type nn --nonlinearity tanh --num-layers-pre 1 --hidden-size 64 --seed 0

And there exists a bug:

Traceback (most recent call last): File "main.py", line 312, in main(args) File "main.py", line 242, in main ls_backtrack_ratio=args.ls_backtrack_ratio) TypeError: step() got an unexpected keyword argument 'max_kl'

Can you provide some help? It seems that the code is wrong.

swami1995 commented 4 years ago

Hey NagisaZj,

Thanks a lot for pointing out the issue. I just realized that I had forgotten to push some of the changes in my local repo after the last merge. I just pushed those changes and tested the pointenvs. It seems to be working. I think it should work for the mujoco envs as well now. Unfortunately, I don't have access to a mujoco key right now, so won't be able to test the mujoco envs today. But do let me know if you face any issues. I can try to arrange for a key and test that as well in the next few days.

Thanks, Swami

NagisaZj commented 4 years ago

Thank you for your response, this version seems to work well. I've got another question: which parameter controls the objective (supervised loss/ self-supervised loss) that the exploration policy use?

swami1995 commented 4 years ago

That's great! You can choose the type of self-supervised loss using --M-type. It takes either of these 3 values : {rewards, returns, next-state}