Closed bryanyuan1 closed 2 years ago
# Rainbow and prioritized replay are parametrized by an exponent alpha,
# but in both cases it is set to 0.5 - for simplicity's sake we leave it
# as is here, using the more direct tf.sqrt(). Taking the square root
# "makes sense", as we are dealing with a squared loss.
# Add a small nonzero value to the loss to avoid 0 priority items. While
# technically this may be okay, setting all items to 0 priority will cause
# troubles, and also result in 1.0 / 0.0 = NaN correction terms.
update_priorities_op = self._replay.tf_set_priority(
self._replay.indices, tf.sqrt(loss + 1e-10))
In https://github.com/google/dopamine/blob/master/dopamine/agents/rainbow/rainbow_agent.py, could you explain why we are updating priorities to tf.sqrt(loss + 1e-10)
? Is it because the loss here is the squared TD-error?
Here the comment says "as we are dealing with a squared loss", but I think here we're using tf.nn.softmax_cross_entropy_with_logits
. Is that a squared loss?
hii, thanks for the note!
regarding your first question, you should be able to just change _build_replay_buffer so it creates an instance of prioritized_replay_buffer, and modify the _store_transition function to also store priorities (like is done in Rainbow).
Perhaps a good reference is to look at the code for our Revisiting Rainbow paper which adds PER to DQN in the DQN code.
regarding your second question, i believe the cross_entropy loss can be considered as an alternative to squared loss, which is why this "makes sense" (quotations explicit).
hope this helps!
Thank you for the answer!
Hi y'all! Thank you for providing Dopamine as it is such an awesome resource. I am looking for the PER algorithm itself, but I cannot find it in this repo, and what I see is the Rainbow agent which also includes more improvements other than PER.
Have you implemented the standalone version of PER in DQN? I did implemented PER from parts in Rainbow, but I just want to make sure it was correct and reproduces the performance in the PER paper.