Open NagisaZj opened 4 years ago
Hey NagisaZj,
Thanks a lot for pointing out the issue. I just realized that I had forgotten to push some of the changes in my local repo after the last merge. I just pushed those changes and tested the pointenvs. It seems to be working. I think it should work for the mujoco envs as well now. Unfortunately, I don't have access to a mujoco key right now, so won't be able to test the mujoco envs today. But do let me know if you face any issues. I can try to arrange for a key and test that as well in the next few days.
Thanks, Swami
Thank you for your response, this version seems to work well. I've got another question: which parameter controls the objective (supervised loss/ self-supervised loss) that the exploration policy use?
That's great! You can choose the type of self-supervised loss using --M-type. It takes either of these 3 values : {rewards, returns, next-state}
Hello there. I am trying to reproduce your experiments with the self-supervision losses. However, I am at a complete loss at which branch I should turn to. Would you kindly provide instructions for what experiment each branch is responsible for?
Edit: I ran main.py in branch another_sparse_branch_ppo with the command you provided: python main.py --env-name HalfCheetahRandVelEnv-v1 --fast-batch-size 20 --meta-batch-size 40 --output-folder hcv-1 --num-workers 16 --embed-size 32 --exp-lr 7e-4 --baseline-type nn --nonlinearity tanh --num-layers-pre 1 --hidden-size 64 --seed 0
And there exists a bug:
Traceback (most recent call last): File "main.py", line 312, in
main(args)
File "main.py", line 242, in main
ls_backtrack_ratio=args.ls_backtrack_ratio)
TypeError: step() got an unexpected keyword argument 'max_kl'
Can you provide some help? It seems that the code is wrong.