Open ginoperrotta opened 2 years ago
Can you we close this as being fixed?
I don't think it has been changed since the issue was raised. It can be closed as unimportant if desired, but it is not fixed.
I think this is because when the first task starts we initialize a new sampler with a new random seed. https://github.com/lifelong-learning-systems/rlblocks/blob/master/minigrid_crossing_experiment/dqn.py#L217
key = (task_name, variant_name)
if key not in self.buffers:
self.buffers[key] = TransitionDatasetWithMaxCapacity(self.max_size)
self.samplers[key] = UniformRandomBatchSampler(
self.buffers[key], seed=self.rng.bit_generator.random_raw()
)
We do the same thing with the random seed for the normal agent, but i think this code snippet above pulls a 2nd number for the first sampler.
A potential fix if we really care about this is to use the already initialized self.replay_buffer, self.replay_sampler for the first task:
if key not in self.buffers:
self.buffers[key] = ... if self.replay_buffer is not None else self.replay_buffer
self.samplers[key] = ... if self.replay_sampler is not None else self.replay_sampler
self.replay_buffer = None
self.replay_sampler = None
Edit: Not sure if the above snippet will actually work, self.replay_sampler
is still populated with the meta sampler, so the logic may have to be slightly different than shown above
The current implementation of
DqnTaskMemory
results in different RNG than the other agents, even given the same seed values. This may not be important, but ensuring the agents have identical performance on the first block helps to highlight what the lifelong learning modules do (and don't do).