Closed fmxFranky closed 4 years ago
Hi @fmxFranky
Indeed, according to theory we should calculate the gradients at all actions. However we found that too costly and only has insignificant improvements, so in our implementation the proposed fractions are only dependent to states, i.e. only calculated at action chosen by \pi(~|s) and shared among all actions.
Above is the personal contact with the author about Fraction Proposal Networks. I think it answers to your question, right?
Hi, @ku2482 Thanks for your reply~
I am learning FQF recent days. Thanks for the repo that I can learn the algorithm more efficiently~ I found that the Fraction Proposal Net's input in FQF is (s, a) which mentioned in the paper(Algorithm 1). But your implementation made all actions share quantiles/taus for the same state. I'm looking forward to your reply to the conflict. Thank you very much!