seohongpark / HIQL

HIQL: Offline Goal-Conditioned RL with Latent States as Actions (NeurIPS 2023)
MIT License
76 stars 7 forks source link

Question about evaluation #3

Closed tunglm2203 closed 9 months ago

tunglm2203 commented 9 months ago

Hi,

I want to ask about the evaluation code.

https://github.com/seohongpark/HIQL/blob/b3e8366ccaec99113778bc360b19894e7a63317c/jaxrl_m/evaluation.py#L112-L119

It seems the action from the high-level policy is sampled every time step, i.e., same frequency as the low-level policy, regardless of the values of way_steps. Why don't you sample high-level policy every way_steps step? Could you elaborate on this point?

Thank you!

seohongpark commented 9 months ago

Hi tunglm2203, thanks for the comments!

As you mentioned, we query both the high-level and low-level policies at each step (somewhat similarly to closed-loop control). We also previously tested querying the high-level policy every way_steps steps, but we didn't find a significant performance difference, and opted to query both at each step because (1) it is simpler and (2) subgoals can be more reactive.