Closed tunglm2203 closed 9 months ago
Hi tunglm2203, thanks for the comments!
As you mentioned, we query both the high-level and low-level policies at each step (somewhat similarly to closed-loop control). We also previously tested querying the high-level policy every way_steps
steps, but we didn't find a significant performance difference, and opted to query both at each step because (1) it is simpler and (2) subgoals can be more reactive.
Hi,
I want to ask about the evaluation code.
https://github.com/seohongpark/HIQL/blob/b3e8366ccaec99113778bc360b19894e7a63317c/jaxrl_m/evaluation.py#L112-L119
It seems the action from the high-level policy is sampled every time step, i.e., same frequency as the low-level policy, regardless of the values of
way_steps
. Why don't you sample high-level policy everyway_steps
step? Could you elaborate on this point?Thank you!