Thanks for making the repo public! I was trying to run some of the experiments from the paper and noticed that the training seems pretty slow. Upon some digging I saw that the default hyperparameters for "samples_per_iter" has been set to 512 (which means that the algo is collecting 512 samples before performing 1 SAC/REDQ update). From my limited experience with SAC, I've mostly seen that hyperparameter set to 1 so just wanted to confirm if there's a reason why it's been set to 512 in this case?
Hey,
Thanks for making the repo public! I was trying to run some of the experiments from the paper and noticed that the training seems pretty slow. Upon some digging I saw that the default hyperparameters for "samples_per_iter" has been set to 512 (which means that the algo is collecting 512 samples before performing 1 SAC/REDQ update). From my limited experience with SAC, I've mostly seen that hyperparameter set to 1 so just wanted to confirm if there's a reason why it's been set to 512 in this case?
Thanks!