openai / random-network-distillation

Code for the paper "Exploration by Random Network Distillation"
https://openai.com/blog/reinforcement-learning-with-prediction-based-rewards/
881 stars 159 forks source link

Samples of each epoch for optimization were not shuffled #17

Open boscotsang opened 5 years ago

boscotsang commented 5 years ago

It seems that for each epoch, the fd was constructed by

end = start + envsperbatch
mbenvinds = slice(start, end, None)
fd = {ph : buf[mbenvinds] for (ph, buf) in ph_buf}

which didn't shuffle the samples in a minibatch Did I miss somewhere do the shuffle operation? If not, why don't we need to shuffle the samples?