Closed racheltrimble closed 1 year ago
Thanks for the report! You are correct. It should be the value at the next state.
Will fix this. In the mean time, if you need to run value iteration, you can:
qvalue
function in value_function.py to all Q(b,a) and select the action with highest value. The ValueIteration planner that was implemented in pomdp-py is quite naive as there is no pruning. It only works for super small problems. I implemented it as an exercise when reading the '98 paper, but never really used it in practice.
Hope this helps!
Great - thanks for the quick confirmation. I realise the unpruned algorithm isn't going to scale far but I was using it to hunt for mistakes in my transition matrix setup before looking at any of the more advanced algorithms. I'm looking at the link to the Cassandra stuff now so I can still progress.
Cheers!
Rachel
Interesting, if it is about debugging transition matrix, I think you can just verify if the transitions are correct (like assert if the next state is what you expect given state and action, for many samples), without having to run a solver?
Also, if it is explicit matrix definition of the POMDP, pomdp-py has some convenient classes like TabularTransitionModel.
This gist is an example of using these tabular classes for the crying-baby problem https://gist.github.com/zkytony/51d43ee6818375434eb3b84a77a47a5c
I meant more wrt the integration into the framework. I basically just hacked your tiger example to match my problem to see if I could get something to run to get started with the tools.
Thanks for the links to the templates though - I hadn't spotted those.
Hi h2r,
I'd like to query whether the state used to calculate the value of the subtrees in the value iteration algorithm should be sp rather than s: https://github.com/h2r/pomdp-py/blob/df59f2690b6e8c0079621ad823765b3ddf6b905f/pomdp_py/algorithms/value_iteration.pyx#L48
So: subtreevalue = self.children[o].values[sp] # corresponds to V{oi(p)} in paper
Rather than: subtreevalue = self.children[o].values[s] # corresponds to V{oi(p)} in paper
The source paper uses s' when calculating V_{oi(p)} and this is also my understanding of the child nodes starting from "the state that this node brought you to".
I'm new to the area, to the library and to the etiquette of issues on GitHub so apologies in advance if this is my misunderstanding.
Many thanks for your useful and clearly structured library.
Rachel