Closed Looomo closed 2 weeks ago
Hi Looomo,
Thanks for the questions. These are small implementation details that we found to improve the performance of GC-IQL. Regarding the first point, we found that decoupling advantages and losses (similarly to how Double DQN decouples argmax and max) slightly improves performance in general. Regarding the second point, we can generally use either (v1+v2)/2 or min(v1, v2) to compute a scalar value from two ensemble value functions. We tested several variants, and found the current variant to perform the best in general. Though as far as I remember, these minor details didn't affect performance that much (but sometimes one is more stable/slightly better than others).
Dear Authors, Nice work of HIQL! I have been trying to run your code recently, but I encountered some difficulties in understanding it. Could you please explain the rationale behind these design choices? That would be a great help to me!
https://github.com/seohongpark/HIQL/blob/b3e8366ccaec99113778bc360b19894e7a63317c/src/agents/hiql.py#L103
https://github.com/seohongpark/HIQL/blob/b3e8366ccaec99113778bc360b19894e7a63317c/src/agents/hiql.py#L91-L97
Since $adv = r + V{target}(s', g) - V{target}(s,g) $, this seems differs form Eq.4.
Could you elaborate on these designs? Thanks~!