Closed leoozy closed 2 months ago
Should the input traj be the obs of the exec after the action above?
Thanks, this is good point!
Our implementation assigns the value of the next actions as the value of the current state, we do this because immediately evaluating the the "obs of the exec after the action above" (as you suggested) would be very expensive due to more backtracking calls. This is explained in more detail in Sec 3.3 and footnote 2 in page 5 of the paper. If we don't care about the cost of backtracking, then I agree that computing the obs AFTER executing the action is a better approach.
Hope that helps!
Thx!
Thank you for your excellent job. I am confused abou the next_actions here. The trajs input for the next_actions are exactly the same as the evaluated actions above. That is, the next actions are exactly the same as the actions above.