zhougroup / IDAC

Implicit Distributional Actor Critic
MIT License
10 stars 4 forks source link

Relation between number of quantiles and noise vector #12

Open kbkartik opened 1 year ago

kbkartik commented 1 year ago

Dear Authors,

I found your paper interesting and had a question. For the distributional critic, why is the number of quantiles (51 as reported) not equal to the noise vector dimension (5 as reported)?

Zhendong-Wang commented 1 year ago

Hi kbkartik,

Thanks for your interest. Number of quantiles and noise vector dimension are two different components in our design.

kbkartik commented 1 year ago

Hi Zhendong-Wang,

  1. Thanks for your response. As per eqn 5: $x{1:K} = {G{\omega}(s, a, \epsilon^k)}_{1:K}$ where $\epsilon^1, \dots, \epsilon^K$ are iid sampled. In the paper, you define $K$ as the number of quantiles. Then, $\epsilon$ is a $K-$ dimensional vector right?
  2. In eqn 9, you have a $K^2$ denominator in quantile regression loss. However, in the quantile regression paper, they divide by $K$. Why do you additionally divide by $K$ in your loss?
  3. Have you compared against quantile regression for TD3? I haven't seen any paper which uses standard quantile regression for quantile regression for continuous control. Any thoughts?
Zhendong-Wang commented 1 year ago
  1. $\epsilon^k$ is a 5 dimensional vector, where is our noise dim. We have K=51 $\epsilon$ for each (s, a) pair. In other workds, $k \in {1, \dots, K} $
  2. No specific reason here for the $K^2$. We just want to take the mean of all $K^2$ elements. I think $K$ should also work, since it only influences a little on the learning rate.
  3. We didn't compare IDAC against quantile regression for TD3, while SDPG could be a similar setting to that.