h2r / pomdp-py

A framework to build and solve POMDP problems. Documentation: https://h2r.github.io/pomdp-py/
MIT License
223 stars 51 forks source link

POMCP with Blackbox model #73

Closed GijsMargadant closed 2 months ago

GijsMargadant commented 2 months ago

I'm experimenting with POMDPs to plan maintenance actions based on noisy sensor data. For this purpose, I would like to use a Blackbox model to determine successor states. From the examples, the code and the documentation, I can't quite figure out how I should implement this.

I have implemented the sample method from the pomdp_py.BlackboxModel interface. It returns a tuple containing the successor state, successor observation and reward. I have set the blackbox_model parameter for both the environment and the agent and left the other models out. When I call plan on my POMCP object, a ValueError is raised when the _rollout function is called in the POUCT class. The full traceback is listed below.

Does someone know why does bug occurs? And is there perhaps a working example where a Blackbox is used that I can use as a guide?

Thanks in advance for any help!

  action = planner.plan(problem.agent)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pomdp_py\\algorithms\\pomcp.pyx", line 94, in pomdp_py.algorithms.pomcp.POMCP.plan
  File "pomdp_py\\algorithms\\po_uct.pyx", line 261, in pomdp_py.algorithms.po_uct.POUCT.plan
  File "pomdp_py\\algorithms\\po_uct.pyx", line 327, in pomdp_py.algorithms.po_uct.POUCT._search
  File "pomdp_py\\algorithms\\po_uct.pyx", line 342, in pomdp_py.algorithms.po_uct.POUCT._perform_simulation
  File "pomdp_py\\algorithms\\pomcp.pyx", line 131, in pomdp_py.algorithms.pomcp.POMCP._simulate
  File "pomdp_py\\algorithms\\po_uct.pyx", line 376, in pomdp_py.algorithms.po_uct.POUCT._simulate
  File "pomdp_py\\algorithms\\po_uct.pyx", line 409, in pomdp_py.algorithms.po_uct.POUCT._rollout
ValueError: need more than 3 values to unpack
zkytony commented 2 months ago

Thanks for the report. What happened likely is that here agent.generative_model.sample (i.e. blackbox model's sample that you implemented) returns a 3-tuple but sample_generative_model (here) and line 409 in _rollout expect a 4-tuple.

You can work around this by returning a 4-tuple (s', o, r, n_steps) from your blackbox models sample function where n_steps could be just 1. I'll make a note about this -- should be a minor fix.

GijsMargadant commented 2 months ago

This fixed my problem indeed, thanks!