Closed dmartinezbaselga closed 8 months ago
I think we have already implemented the operation you mentioned line 763-768 (link), i.e.:
q_quantiles_hats = (q_quantiles[:, 1:] + q_quantiles[:, :-1]).detach() / 2. # (batch_size, num_quantiles)
# NOTE(rjy): reparameterize q_quantiles_hats
q_quantile_net = self.quantile_net(q_quantiles_hats) # [batch_size, num_quantiles, hidden_size(64)]
# x.view[batch_size, 1, hidden_size(64)]
q_x = (x.view(batch_size, 1, -1) * q_quantile_net) # [batch_size, num_quantiles, hidden_size(64)]
Could you please confirm again? If you have other problems, you can continue to reply in this issue.
Hi,
Thanks for the response. What you are referring is the computation of $\hat{\taui}=\frac{\tau{i+1}-\tau{i}}{2}$ (line 763) and the embedding computation of $\hat{\tau}$ (764-768), which is in the paper in section 3.4: These refer to the quantile values computation: The part that I am missing is the $\tau{i+1}-\tau{i}$ that measures the width of the quantile fractions of: Instead of a mean, it's a weighted mean.
Thanks for your feedback, I have checked this part of the original paper and our implementation.
logit
is only used to select action in collect/eval mode, it is not used in training (learn mode).
Hello!
Thank you for this project, it's really complete and modular, which makes it easy to replicate and modify the code. I have a comment regarding the implementation of FQF. As far as I understand the method, line 772 of the file DI-engine/ding/model/common/head.py should be something like:
instead of:
logit = q.mean(1)
Thank you in advance for your time, and sorry if I am wrong with the issue and it's a misunderstanding!