(If there is a better forum to ask clarification questions, please let me know!)
In aggregate_rollouts in code/ars.py, I see that the code does
rollout_ids_one = [worker.do_rollouts.remote(policy_id,
num_rollouts = num_rollouts,
shift = self.shift,
evaluate=evaluate) for worker in self.workers]
rollout_ids_two = [worker.do_rollouts.remote(policy_id,
num_rollouts = 1,
shift = self.shift,
evaluate=evaluate) for worker in self.workers[:(num_deltas % self.num_workers)]]
What is the purpose of doing the rollouts twice, with one doing num_rollouts rollouts per worker and the other doing 1 rollout per worker for num_deltas % self.num_workers workers?
(If there is a better forum to ask clarification questions, please let me know!)
In
aggregate_rollouts
incode/ars.py
, I see that the code doesWhat is the purpose of doing the rollouts twice, with one doing
num_rollouts
rollouts per worker and the other doing 1 rollout per worker fornum_deltas % self.num_workers
workers?