Potential issue in value iteration algorithm

h2r / pomdp-py

A framework to build and solve POMDP problems. Documentation: https://h2r.github.io/pomdp-py/

MIT License

210 stars 49 forks source link

Potential issue in value iteration algorithm #20

Closed racheltrimble closed 1 year ago

racheltrimble commented 2 years ago

Hi h2r,

I'd like to query whether the state used to calculate the value of the subtrees in the value iteration algorithm should be sp rather than s: https://github.com/h2r/pomdp-py/blob/df59f2690b6e8c0079621ad823765b3ddf6b905f/pomdp_py/algorithms/value_iteration.pyx#L48

So: subtreevalue = self.children[o].values[sp] # corresponds to V{oi(p)} in paper

Rather than: subtreevalue = self.children[o].values[s] # corresponds to V{oi(p)} in paper

The source paper uses s' when calculating V_{oi(p)} and this is also my understanding of the child nodes starting from "the state that this node brought you to".

I'm new to the area, to the library and to the etiquette of issues on GitHub so apologies in advance if this is my misunderstanding.

Many thanks for your useful and clearly structured library.

Rachel

zkytony commented 2 years ago

Thanks for the report! You are correct. It should be the value at the next state.

Will fix this. In the mean time, if you need to run value iteration, you can:

use the "vi_pruning" algorithm listed here. This is an interface for the pomdp-solve package in C++; You can define the POMDP in pomdp-py, and this function converts it to the POMDP file format and feeds it to pomdp-solve, and then parses the output policy. The documentation contains an example. You can also find an example in this test case.
use the qvalue function in value_function.py to all Q(b,a) and select the action with highest value.

The ValueIteration planner that was implemented in pomdp-py is quite naive as there is no pruning. It only works for super small problems. I implemented it as an exercise when reading the '98 paper, but never really used it in practice.

Hope this helps!

racheltrimble commented 2 years ago

Great - thanks for the quick confirmation. I realise the unpruned algorithm isn't going to scale far but I was using it to hunt for mistakes in my transition matrix setup before looking at any of the more advanced algorithms. I'm looking at the link to the Cassandra stuff now so I can still progress.

Cheers!

Rachel

zkytony commented 2 years ago

Interesting, if it is about debugging transition matrix, I think you can just verify if the transitions are correct (like assert if the next state is what you expect given state and action, for many samples), without having to run a solver?

zkytony commented 2 years ago

Also, if it is explicit matrix definition of the POMDP, pomdp-py has some convenient classes like TabularTransitionModel.

This gist is an example of using these tabular classes for the crying-baby problem https://gist.github.com/zkytony/51d43ee6818375434eb3b84a77a47a5c

racheltrimble commented 2 years ago

I meant more wrt the integration into the framework. I basically just hacked your tiger example to match my problem to see if I could get something to run to get started with the tools.

Thanks for the links to the templates though - I hadn't spotted those.