higgsfield / RL-Adventure

Pytorch Implementation of DQN / DDQN / Prioritized replay/ noisy networks/ distributional values/ Rainbow/ hierarchical RL
2.99k stars 587 forks source link

Error in Priority Update for Prioritized Replay #13

Open qfettes opened 6 years ago

qfettes commented 6 years ago

It looks like you're updating the priorities in the replay buffer according to the weighted and squared TD error.

loss  = (q_value - expected_q_value.detach()).pow(2) * weights
prios = loss + 1e-5
replay_buffer.update_priorities(indices, prios.data.cpu().numpy())

However, the algorithm in the original paper updates the priority only according to the absolute value of the TD error, which is not weighted. I believe this is a mistake in your implementation

mneira10 commented 4 years ago

Can confirm: image Transition priorities are updated with the magnitude of the TD error (lines 11-12). Paper for reference