Relation between number of quantiles and noise vector

zhougroup / IDAC

Implicit Distributional Actor Critic

MIT License

10 stars 4 forks source link

Relation between number of quantiles and noise vector #12

Open kbkartik opened 1 year ago

kbkartik commented 1 year ago

Dear Authors,

I found your paper interesting and had a question. For the distributional critic, why is the number of quantiles (51 as reported) not equal to the noise vector dimension (5 as reported)?

Zhendong-Wang commented 1 year ago

Hi kbkartik,

Thanks for your interest. Number of quantiles and noise vector dimension are two different components in our design.

Number of quantiles refers to how many details we want to obtain from the distributional critic learning, e.g., more quantiles the density estimiation could be more precise while more computational cost.
Noise vector dimension refers to the randomness or expressiveness hidden in the distributional critic. We fix this value based on the state dim, since we concat [state, noise] as our critic input.

kbkartik commented 1 year ago

Hi Zhendong-Wang,

Thanks for your response. As per eqn 5: $x{1:K} = {G{\omega}(s, a, \epsilon^k)}_{1:K}$ where $\epsilon^1, \dots, \epsilon^K$ are iid sampled. In the paper, you define $K$ as the number of quantiles. Then, $\epsilon$ is a $K-$ dimensional vector right?
In eqn 9, you have a $K^2$ denominator in quantile regression loss. However, in the quantile regression paper, they divide by $K$. Why do you additionally divide by $K$ in your loss?
Have you compared against quantile regression for TD3? I haven't seen any paper which uses standard quantile regression for quantile regression for continuous control. Any thoughts?

Zhendong-Wang commented 1 year ago

$\epsilon^k$ is a 5 dimensional vector, where is our noise dim. We have K=51 $\epsilon$ for each (s, a) pair. In other workds, $k \in {1, \dots, K} $
No specific reason here for the $K^2$. We just want to take the mean of all $K^2$ elements. I think $K$ should also work, since it only influences a little on the learning rate.
We didn't compare IDAC against quantile regression for TD3, while SDPG could be a similar setting to that.