Questions of several implementation details

xlnwel commented 4 years ago

Hello @ku2482

May I ask you several implementation details and why you made these decisions?

In this line, you compute signs by comparing sa_quantiles[I] with sa_quantiles[i-1](except the first one). Why don't you use values_1>0 as the signs?
In this line, you initialize the weights of the FractionProposalNetwork using Xavier initialize with gain=0.01. What makes you choose this initialization?

toshikwa commented 4 years ago

Hi @xlnwel

1 When F_Z^{-1}(w) is non-decreasing, the sign is +1 and gradients are the same as Proposition 1.

However, when F_Z^{-1}(w) is decreasing, the assumption of Proposition 1 doesn't hold anymore. In this case, I correct the gradients setting the sign to -1.

2 It's hand-designed by me.

The weights of the fraction proposal network are initialized so that initial probabilities are uniform as in QR-DQN

The author mentioned above in the paper, so I made the weight smaller so that initial probabilities are closer to the uniform distribution and the initialization was less influential.

Thanks :)

xlnwel commented 4 years ago

Hi @ku2482

Thank you for the reply :-)

I get why you use sign, but I'm wondering why not use values_1>0 as the sign as it directly compare sa_quantiles against sa_quantile_hats[:, :-1] to ensure non-decreasing.
That was a smart choice as I found normal xavier_uniform can easily result in distribution collapse even if the distribution looks uniform at the beginning of the training.

toshikwa commented 4 years ago

Hi @xlnwel

I get why you use sign, but I'm wondering why not use values_1>0 as the sign as it directly compare sa_quantiles against sa_quantile_hats[:, :-1]

I want to evaluate if F_Z^{-1}(w) is non-decreasing (or not) when taui < w < tau{i+1}. So I defined the sign as "FZ^{-1}(tau{i+1}) - F_Z^{-1}(tau_i)". (See the proof for Proposition 1.)

However, I can't calculate F_Z^{-1}(0.0) and F_Z^{-1}(1.0), so I instead use F_Z^{-1}(tau_hats[0]) and F_Z^{-1}(tau_hats[-1]) respectively.

Does it make sense?

xlnwel commented 4 years ago

Hi @ku2482,

Thank you for your explanation and patience :-)

Perhaps because of my poor expression, it seems that you still have not got my idea- -. Yeah, your way definitely makes sense to say that if F_Z is non-decreasing(i.e., F_Z^{-1}(tau_{i+1}) - F_Z^{-1}(tau_i)), the sign is True. What I tried to express was that now that we computed value as sa_quantiles - sa_quantile_hats[:, :-1](i.e., F_Z^{-1}(tau_{i}) - F_Z^{-1}(\hat tau_{i-1})), why not directly use sa_quantiles > sa_quantile_hats[:, :-1](i.e., F_Z^{-1}(tau_{i}) > F_Z^{-1}(\hat tau_{i-1})) as the sign which seem more directly related to value?

I've tried both methods. Experiments show that your method performs better in some cases but I fail to see the reasons.

toshikwa commented 4 years ago

Hi @xlnwel

why not directly use sa_quantiles > sa_quantile_hats[:, :-1](i.e., FZ^{-1}(tau{i}) > FZ^{-1}(\hat tau{i-1})) as the sign

That's not what I want to calculate. It's not the same amount as what I've used.

xlnwel commented 4 years ago

Hi @ku2482

But it also attempts to make sure the non-decreasing condition on FZ -- `tau{i} > \hat tau_{i-1}->F_Z^{-1}() > FZ^{-1}(\hat tau{i-1})` -- am I right? Why do you think this is infeasible ?

toshikwa commented 4 years ago

Hi, please see the proof for Proposition 1 carefully to get the theory behind it. However, I just defined the sign as explained and you can use any other definitions if you want.

Please reopen the issue if you still have any questions.

xlnwel commented 4 years ago

Hi @ku2482.

Thank you. Are you interested in participating in the Procgen challenge organized by OpenAI? I'm currently in the 5th position of round 1. If you're interested and have time, maybe we can do this together.

toshikwa commented 4 years ago

Hi @xlnwel

Thank you for inviting :) It would be great to work together, however, I've already joined the team to help my friend learn RL.

I hope we'll be able to work together next time. Please feel free to contact me through LinkedIn.

Thanks.

xlnwel commented 4 years ago

Hi @ku2482

No problem. May I ask what your position is right now? Maybe we can share some thoughts after the challenge is completed.

toshikwa / fqf-iqn-qrdqn.pytorch

Questions of several implementation details #9