Open xlnwel opened 4 years ago
hi, regarding your first question, is there something that makes you believe it's not correct?
regarding your second question, one way you can verify this is by running the unit test.
Hi, Sorry for the first misunderstanding, my bad. I replace the code starts from here with the following code.
with tf.control_dependencies(validate_deps):
# Ex: `v_min, v_max = 4, 8`.
v_min, v_max = target_support[0], target_support[-1]
y = tf.clip_by_value(supports, v_min, v_max)[:, None, :] # [B, 1, N]
target_support = target_support[None, :, None] # [1, N, 1]
y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1) # [B, N, N]
y = tf.reduce_sum(y * weights[:, None, :], axis=2) # [B, N]
projection = tf.stop_gradient(y)
return projection
The test seems suggesting it's okay. Does it mean I'm correct in the second one?
Last, thanks for your help!
yup, seems like the second one is a clean refactoring! :)
On Thu, Oct 17, 2019 at 8:45 PM The Raven Chaser notifications@github.com wrote:
Hi, Sorry for the first misunderstanding, my bad. I replace the code starts from here https://github.com/google/dopamine/blob/6463bfa8660daf17823825ab884b118d3a57ea4e/dopamine/agents/rainbow/rainbow_agent.py#L402 with the following code.
with tf.control_dependencies(validate_deps):
Ex:
v_min, v_max = 4, 8
.v_min, v_max = target_support[0], target_support[-1] y = tf.clip_by_value(supports, v_min, v_max)[:, None, :] # [B, 1, N] target_support = target_support[None, :, None] # [1, N, 1] y = tf.clip_by_value(1. - tf.abs(y - target_support) / delta_z, 0, 1) # [B, N, N] y = tf.reduce_sum(y * weights[:, None, :], axis=2) # [B, N] projection = tf.stop_gradient(y)
The test seems suggesting it's okay. Does it mean I'm correct in the second one? [image: image] https://user-images.githubusercontent.com/10184153/67057518-63ddc980-f183-11e9-95d5-d8613024ee43.png Last, thanks for your help!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/120?email_source=notifications&email_token=AE3CCMM2NORWK2PR4RY5WDTQPEBKVA5CNFSM4JBWBTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBSBEXI#issuecomment-543429213, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMNY5QB4FYRUHK7HUFDQPEBKVANCNFSM4JBWBTUA .
May I pull this solution to simplify the code?
if you don't mind, let's hold off on this for a bit. we're still sorting out some issues with external pull requests. but thank you!
On Fri, Oct 18, 2019 at 7:07 AM The Raven Chaser notifications@github.com wrote:
May I pull this solution to simplify the code?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/dopamine/issues/120?email_source=notifications&email_token=AE3CCMOJWR6OZFEEAQK3MN3QPGKIXA5CNFSM4JBWBTUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBT5ZTI#issuecomment-543677645, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3CCMLA3TUQ6ZWBV4A7DYTQPGKIXANCNFSM4JBWBTUA .
Sure, no problem. But if you find this implementation is not right, please let me know.
By the way, thanks for your open-source code. That's awesome.
The following code uses target distribution in the
softmax_cross_entropy_with_logits
. Is this correct?https://github.com/google/dopamine/blob/6463bfa8660daf17823825ab884b118d3a57ea4e/dopamine/agents/rainbow/rainbow_agent.py#L259
BTW, I found the code for the projection in c51 is somewhat complicated. I use the following code to compute the projection in Eq.7 in the paper
All names except
y
stand for the same meaning as those defined in functionproject_distribution
. Do you think they are doing the same thing?