Hi and thanks a lot for rlkit, it's really useful. I'm using the sac and td3 modules atm.
Upon investigating your code I wondered if you could clarify to me a bit the precise purpose of the eval loops? I just want to be sure that I understood the code correctly.
As I understood, training in batch_rl_algorithm is split into an evaluation and an exploration subroutine. Training steps are performed on the replay buffer that is filled only from exploration steps but not from evaluation steps. So I wondered if the eval step serves a function in training? Or is it "just" meant for logging and recording the progress of learning?
Hope I didn't miss some very obvious points in your code.
I'm glad to hear that rlkit is useful! You're correct that it's just for logging and recording progress. Feel free to reopen this issue if you have more questions.
Hi and thanks a lot for rlkit, it's really useful. I'm using the sac and td3 modules atm. Upon investigating your code I wondered if you could clarify to me a bit the precise purpose of the eval loops? I just want to be sure that I understood the code correctly. As I understood, training in batch_rl_algorithm is split into an evaluation and an exploration subroutine. Training steps are performed on the replay buffer that is filled only from exploration steps but not from evaluation steps. So I wondered if the eval step serves a function in training? Or is it "just" meant for logging and recording the progress of learning? Hope I didn't miss some very obvious points in your code.
Best and thanks again Ben