Closed Guo-Stone closed 3 weeks ago
Hi, Thanks for your interest in our work. Indeed, in statistical theory, bootstrapping refers to resampling data. In RL, however, bootstrapping is a way to encourage exploration. See https://rail.eecs.berkeley.edu/deeprlcourse/deeprlcourse/static/slides/lec-13.pdf
Regarding implementing bootstrapping in RL, there are several usual considerations.
Dear authors, I would like to ask a question about the code implementation of bootstrap reward. In my opinion, bootstrap method involves in training several models using different dataset. And I think that the reward $r$ and its uncertainty $g$ should be the average and deviation of the outputs of all the bootstrapped models. But in your following code, why do you only choose the best one as the reward $r$ and not take uncertainty $g$ into consideration?