General question about goal and failure states?

h2r / pomdp-py

A framework to build and solve POMDP problems. Documentation: https://h2r.github.io/pomdp-py/

MIT License

216 stars 50 forks source link

General question about goal and failure states? #36

Closed troiwill closed 1 year ago

troiwill commented 1 year ago

Hi, I have a general question. How can I explicitly specify goal and failure states using this package?

zkytony commented 1 year ago

Hi @troiwill, what do you expect to happen at goal and failure states? In general, you should be able to define such states through how the transition and reward functions behave at them.

troiwill commented 1 year ago

Hey @zkytony, thanks for the quick follow up. I will focus on failure but the concept sort of applies to goal as well. Essentially, I am seeking a way to force rollout to stop executing because a failure state was entered; this would save me from further unnecessary computations because I (or the system) would know that no transitions are possible.

I added a failure state to my transition model, where the state simply remains the same. But I do not think it stops rollout from continuing to propagate the state.

zkytony commented 1 year ago

You are correct that rollout in POMCP/POUCT doesn't terminate until reaching a given depth. pomdp-py treats final/goal states the same as any other states at the library level (for simplicity). It achieves the same effect for planning if you make rewards after reaching goal or failure states be 0, and have those states transition to themselves. You can use max_depth to limit rollout depth.

Are you implementing your own rollout function? (if not, I think the default rollout is not computationally expensive). If you are worried that rollout eats up planning time, you can always set num_sims instead.