can you say more about your training procedure?

bcolloran commented 4 years ago

hi @vinits5, thank you for publishing your work on this! I've been trying to complete BipedalWalker-v2 using a number of techniques, and I'm having trouble reproducing your very nice results on the walker with ARS.

Looking through your code, my best guess is that you used the default parameters:

    parser.add_argument('--v', type=float, default=0.03, help='noise in delta')
    parser.add_argument('--N', type=int, default=16, help='No of perturbations')
    parser.add_argument('--b', type=int, default=16, help='No of top performing directions')
    parser.add_argument('--lr', type=float, default=0.02, help='Learning Rate')
    parser.add_argument('--normalizer', type=bool, default=True, help='use normalizer')
    parser.add_argument('--env', type=str, default='BipedalWalker-v2', help='name of environment')
    parser.add_argument('--log', type=str, default='exp_biped_5', help='Log folder to store videos')

for your successful training run, but I want to make sure that is correct, and that you didn't supply a different set of parameters from the command line for your successful run.

Also, how many random seeds did you have to try before you achieved a successful training? And how many episodes did you have to run (I don't want to stop too early if it looks like I'm stuck in a local maximum but I just need to train longer).

I think I probably just have a bug in my version of the code, so I thought I'd check with you to rule out these factors.

Thanks again for sharing your code and results! Very helpful to other folks like me who are trying to learn! :-)

bcolloran commented 4 years ago

Oh, and one more question: approximately how many training epochs did it take to begin to see good progress? Thanks!

vinits5 commented 4 years ago

Thank you for your interest. We have used the default parameters as stated in code.

I am a bit confused about your question related to seed. But, we have used 16 random directions to search for test best policy in ARS and no seed was given at the beginning of training.

We have got really good results from 900 epochs. But, allow your training to go up to 2000 epochs. I believe after approximately 500 epochs the biped starts to show signs of walking.

vinits5 / augmented-random-search

can you say more about your training procedure? #2