Closed troiwill closed 1 year ago
Hi @troiwill, what do you expect to happen at goal
and failure
states? In general, you should be able to define such states through how the transition and reward functions behave at them.
Hey @zkytony, thanks for the quick follow up. I will focus on failure
but the concept sort of applies to goal
as well. Essentially, I am seeking a way to force rollout
to stop executing because a failure state was entered; this would save me from further unnecessary computations because I (or the system) would know that no transitions are possible.
I added a failure
state to my transition model, where the state simply remains the same. But I do not think it stops rollout
from continuing to propagate the state.
You are correct that rollout
in POMCP/POUCT doesn't terminate until reaching a given depth. pomdp-py treats final/goal states the same as any other states at the library level (for simplicity). It achieves the same effect for planning if you make rewards after reaching goal or failure states be 0, and have those states transition to themselves. You can use max_depth to limit rollout depth.
Are you implementing your own rollout function? (if not, I think the default rollout is not computationally expensive). If you are worried that rollout eats up planning time, you can always set num_sims
instead.
Hi, I have a general question. How can I explicitly specify
goal
andfailure
states using this package?