Understanding the SAC paramters

nanbaima commented 4 years ago

Hi Vitchyr,

If I may, I would kindly like to ask you an question related to: https://github.com/vitchyr/rlkit/blob/1d469a509b797ca04a39b8734c1816ca7d108fc8/examples/sac.py#L92-L96

About:

What does those SAC parameters means?
In what way they interfere in my training study case?
Or even were may I find further information about it?

I did read the SAC paper, however didn't find any specificities such as, num_eval_steps_per_epoch, or min_num_steps_before_training, so I got a little lost.

I would kindly appreciate if you could give me any help regarding this subject!

nanbaima commented 4 years ago

By trial and error I could observe, by felling not scientific :), few things in my implementation:

max_path_length its like the size of max timesteps of your implementation roll-out
min_num_steps_before_training its how many random steps it takes to fill out the buffer... to my implementation it didn't change much
num_expl_steps_per_train_loop it like how many exploration steps, per epoch, it will use the current policy
um_trains_per_train_loop its how many batch learning it is going to produce per epoch
num_eval_steps_per_epoch how many steps are being used to generate the graph (the algorithm itself doesn't require it for the SAC, but for generating the graph)
net_size its the number of networks neurons
batch_size: the bigger it is, the smaller the variance and the longer the training time are.

I could observe that it has to be like the num train per loop >>> n_exploração (exploration should be much higher than train per loop)

Here are the hyper parameters batch_rl_algorithm.py

I hope I can help anyone, and I could make any thing clear...

vitchyr commented 4 years ago

Yup! And to clarify, the num_trains_per_train_loop is per train loop, but typically the number of train loops per epoch is 1, so this is effectively the same.

rail-berkeley / rlkit

Understanding the SAC paramters #89