Closed xlnwel closed 4 years ago
Hi @xlnwel
1 When F_Z^{-1}(w) is non-decreasing, the sign is +1 and gradients are the same as Proposition 1.
However, when F_Z^{-1}(w) is decreasing, the assumption of Proposition 1 doesn't hold anymore. In this case, I correct the gradients setting the sign to -1.
2 It's hand-designed by me.
The weights of the fraction proposal network are initialized so that initial probabilities are uniform as in QR-DQN
The author mentioned above in the paper, so I made the weight smaller so that initial probabilities are closer to the uniform distribution and the initialization was less influential.
Thanks :)
Hi @ku2482
Thank you for the reply :-)
values_1>0
as the sign as it directly compare sa_quantiles
against sa_quantile_hats[:, :-1]
to ensure non-decreasing.xavier_uniform
can easily result in distribution collapse even if the distribution looks uniform at the beginning of the training.Hi @xlnwel
I get why you use sign, but I'm wondering why not use values_1>0 as the sign as it directly compare sa_quantiles against sa_quantile_hats[:, :-1]
I want to evaluate if F_Z^{-1}(w) is non-decreasing (or not) when taui < w < tau{i+1}. So I defined the sign as "FZ^{-1}(tau{i+1}) - F_Z^{-1}(tau_i)". (See the proof for Proposition 1.)
However, I can't calculate F_Z^{-1}(0.0) and F_Z^{-1}(1.0), so I instead use F_Z^{-1}(tau_hats[0]) and F_Z^{-1}(tau_hats[-1]) respectively.
Does it make sense?
Hi @ku2482,
Thank you for your explanation and patience :-)
Perhaps because of my poor expression, it seems that you still have not got my idea- -. Yeah, your way definitely makes sense to say that if F_Z is non-decreasing(i.e., F_Z^{-1}(tau_{i+1}) - F_Z^{-1}(tau_i)
), the sign is True. What I tried to express was that now that we computed value
as sa_quantiles - sa_quantile_hats[:, :-1]
(i.e., F_Z^{-1}(tau_{i}) - F_Z^{-1}(\hat tau_{i-1})
), why not directly use sa_quantiles > sa_quantile_hats[:, :-1]
(i.e., F_Z^{-1}(tau_{i}) > F_Z^{-1}(\hat tau_{i-1})
) as the sign which seem more directly related to value
?
I've tried both methods. Experiments show that your method performs better in some cases but I fail to see the reasons.
Hi @xlnwel
why not directly use sa_quantiles > sa_quantile_hats[:, :-1](i.e., FZ^{-1}(tau{i}) > FZ^{-1}(\hat tau{i-1})) as the sign
That's not what I want to calculate. It's not the same amount as what I've used.
Hi @ku2482
But it also attempts to make sure the non-decreasing condition on FZ -- `tau{i} > \hat tau_{i-1}->
F_Z^{-1}() > FZ^{-1}(\hat tau{i-1})` -- am I right? Why do you think this is infeasible ?
Hi, please see the proof for Proposition 1 carefully to get the theory behind it. However, I just defined the sign as explained and you can use any other definitions if you want.
Please reopen the issue if you still have any questions.
Hi @ku2482.
Thank you. Are you interested in participating in the Procgen challenge organized by OpenAI? I'm currently in the 5th position of round 1. If you're interested and have time, maybe we can do this together.
Hi @xlnwel
Thank you for inviting :) It would be great to work together, however, I've already joined the team to help my friend learn RL.
I hope we'll be able to work together next time. Please feel free to contact me through LinkedIn.
Thanks.
Hi @ku2482
No problem. May I ask what your position is right now? Maybe we can share some thoughts after the challenge is completed.
Hello @ku2482
May I ask you several implementation details and why you made these decisions?
sa_quantiles[I]
withsa_quantiles[i-1]
(except the first one). Why don't you usevalues_1>0
as the signs?gain=0.01
. What makes you choose this initialization?