Closed cmbowyer13 closed 4 years ago
Could you paste some code of how you are doing this? From the error message, it seems like you are directly passing a tuple into a transition function? If that's the case, you would need to wrap the tuple data structure by a class that inherits the State
class.
You can check out the rocksample and multi-object-search domains which have state variables that are two dimensional.
Thanks for the suggestion; I started to look at the other examples yesterday including light dark and mos. However, I think I am running into a few new problems which I'll explain next and I'll share code where I think it's making errors. 1) Some of the difficulties I think are taking my state dependent action sets and making them an Action class.
2) Understanding how the planner works, right now I think the actions are failing here because it expects one argument, but because of 1) I need to specify a state for the planner step as well somehow.
3) How to best get access to the belief_state of the agent/planner at each iteration?
4) I have a finite number of states, actions and observations, and have the transitions stored in numpy arrays and I am passing these around in the transition and observation models. I haven't seen this limit me yet or cause an error. The tiger problem specified deeply nested dictionaries for the transitions; do I need to do this step? But because it seems most things have to be dictionaries maybe it will eventually cause an error.
5) How can I maintain a belief for one state component and not the other? Not sure how the Histogram works but it seems to expect a belief vector for the whole state. For me, one state is deterministically controlled while the other is stochastically evolving and not controlled. So for the first state I have a control rule, and the second component a transition matrix.
Based on the tiger problem, it is solved with VI, pomcp, and POUCT to test. If I can get it working for VI for my problem, it should work for the other two solvers right?
PolicyModel
. Check out the documentationThe job of a PolicyModel is to (1) determine the set of actions that the robot can take at given state (and/or history); (2) sample an action from this set according to some probability distribution.
I don't quite understand what your issue is. Does my answer above help you resolve this issue?
You can just do agent.belief
. You can also check out the pomdp_py/framework/basics.pyx
to see what are the available functions of Agent
by default.
No, you don't have to use dictionaries. pomdp_py
just provides an interface for the TransitionModel
. To simulate a transition, you just need to implement the sample(state, action)
function. It doesn't matter what data structure or algorithm you're using underneath to implement this function.
In that case it sounds like your state space is factored. The MOS domain essentially have the same situation (The robot can observe its own state, but not the targets'). You can have a Histogram for the observable state, where probability = 1.0 at some state and 0.0 everywhere else. Then you can have another Histogram for the unobservable state. You can then use the interface OOBelief
, or code up something similar yourself (because you may not need to have the concept of objects) which essentially allows you to sample from the joint belief distribution.
By the way, the VI algorithm is quite primitive. Unless your POMDP is really small (like tiger), the VI algorithm probably won't scale to it. Currently your best bet is to just try POMCP/POUCT directly.
Thanks for the extra insight with PolicyModel; I'll try that now first. So instead of trying to specify the action constraints in the Action class, I'll just union together all possible actions and check it is one of these in the union. And I'll keep the transitions as matrices, and I may just need to override some of the behavior in Environment and Agent classes to get the effects with the belief I need. I'll let you know how it goes.
From my main program, the code is getting stuck or failing at this point:
def test_planner(tiger_problem, planner, nsteps=3):
"""
Runs the action-feedback loop of Tiger problem POMDP
Args:
tiger_problem (TigerProblem): an instance of the tiger problem.
planner (Planner): a planner
nsteps (int): Maximum number of steps to run this loop.
"""
for i in range(nsteps):
action = planner.plan(tiger_problem.agent)
print("==== Step %d ====" % (i+1))
print("True state: %s" % tiger_problem.env.state)
print("Belief: %s" % str(tiger_problem.agent.cur_belief))
print("Action: %s" % str(action))
print("Reward: %s" % str(tiger_problem.env.reward_model.sample(tiger_problem.env.state, action, None)))
It fails at this line
action = planner.plan(tiger_problem.agent)
I'm just copying here the same part of the code that would be executed for Tiger, but it's erring for my problem description. Explained more below, and next comments.
The error says:
File "pomdp_py/algorithms/pomcp.pyx", line 92, in pomdp_py.algorithms.pomcp.POMCP.plan
File "pomdp_py/algorithms/po_uct.pyx", line 252, in pomdp_py.algorithms.po_uct.POUCT.plan
File "pomdp_py/algorithms/po_uct.pyx", line 314, in pomdp_py.algorithms.po_uct.POUCT._search
TypeError: Cannot convert int to pomdp_py.framework.basics.State
To simplify anything else, I'm only going to test with POMCP and leave the other two solvers alone. I have another issue of how to handle the belief state maintenance with histogram, I'll show you the code I'm using to set that up, and the issue I ran into.
For the belief state since I only want to maintain one for the second state, I would like to try:
# define the initial belief-state for the second state s_k:
card_X = 360
belief_dict = {}
for i in range(360):
belief_dict[i] = 1/card_X
#state_prob_dict = {key:value for (key,value) in belief_dict.items()}
state_prob_dict = {State(key):value for (key,value) in belief_dict.items()}
init_belief = pomdp_py.Histogram(state_prob_dict)
But this fails because my State() expects a 2d object like numpy array or tuple. I do want to keep the State as it is defined which is to keep the state info grouped together. Maybe I don't need to wrap State() around each key, but trying to follow the tiger example it worked nicely because there was only one state. So it's a circular issue. Don't know what would be the easiest fix for this situation.
Here's my state class:
import pomdp_py
import numpy as np
class State(pomdp_py.State):
"""The state of the two positions of x_k and s_k"""
def __init__(self, position):
"""
Initializes a state in the tracking domain.
Args:
position (tuple) or np.array: positions of x_k and s_k.
"""
if len(position) != 2:
raise ValueError("State position must be a vector of length 2")
self.x_k = position[0]
self.s_k = position[1]
def __hash__(self):
return hash(self.x_k, self.s_k)
def __eq__(self, other):
if isinstance(other, State):
return ((self.x_k == other.x_k) and (self.s_k and other.s_k))
else:
return False
def __str__(self):
return self.__repr__()
def __repr__(self):
return "X_k = %, S_k = %" % (str(self.x_k, self.s_k))
From the error message TypeError: Cannot convert int to pomdp_py.framework.basics.State
, are you trying to feed an int into a function that requires State as the argument? Because that seems like what's happening.
So my understanding is you want to have one of x_k
and s_k
to be fully observable and the other one be unobservable. But you don't want to separate them into two classes. Well, the easiest thing you can do is the have a histogram over the joint space of all x_k
and s_k
values, and set probabilities to be zero for all cases where the observable variable doesn't have groundtruth value.
Hello, yes I think that that error from int to State is coming from the belief state issues. I'm not opposed, to splitting the state into separate Classes, but I think it would make more since to keep them together. If I separated them, I'm not sure how I would use it based on the example or with the other objects. How do I set up the histogram over the joint space as you mentioned. Do you have an example of doing that? I did manage to get around the Action issues, through PolicyModel() so that's fixed, and should just randomly select an action based on the current state once I get past this belief state setup.
Glad you fixed it. I imagine you could do something like this to initialize a belief distribution for your case. I am going to assume x_k
is observable, and s_k
is uncertain:
prob = {} # maps from state to probability
x_k = some_particular_value
for s_k in S_k: # S_k is the set of all possible s_k values
state = State((x_k, s_k))
prob[state] = 1 / |{S_k}|
belief = Histogram(prob)
I think you would have to look closer into the "int" issue. I'm not sure why it is, but I think should be a simple fix somewhere in your code.
Ok, I have redefined the initial belief state to be joint, but the PolicyModel may be not completely correct. I tried to change a few things in PolicyModel to sample for the action. I even had to add indexing to my State class to get the behavior I wanted but adding the get item dunder isn't that costly. The error I'm getting is now a
TypeError: Cannot convert int to pomdp_py.framework.basics.Action
It's getting hung up on the planner.plan()
call.
File "pomdp_py/algorithms/pomcp.pyx", line 92, in pomdp_py.algorithms.pomcp.POMCP.plan
File "pomdp_py/algorithms/po_uct.pyx", line 252, in pomdp_py.algorithms.po_uct.POUCT.plan
File "pomdp_py/algorithms/po_uct.pyx", line 315, in pomdp_py.algorithms.po_uct.POUCT._search
File "pomdp_py/algorithms/pomcp.pyx", line 129, in pomdp_py.algorithms.pomcp.POMCP._simulate
File "pomdp_py/algorithms/po_uct.pyx", line 342, in pomdp_py.algorithms.po_uct.POUCT._simulate
File "pomdp_py/algorithms/po_uct.pyx", line 287, in pomdp_py.algorithms.po_uct.POUCT._expand_vnode
TypeError: Cannot convert int to pomdp_py.framework.basics.Action
Looks like your policy model is returning an int instead of an Action. You need to debug it according to the trace stack.
I don't see how because I'm wrapping an Action() around the sample returned from my list of available actions at the current state:
# Policy Model
class PolicyModel(pomdp_py.RandomRollout):
"""This is an extremely dumb policy model; To keep consistent
with the framework."""
def probability(self, action, state, normalized=False, **kwargs):
raise NotImplementedError # Never used
def sample(self, state, normalized=False, **kwargs):
# need to take int action returned from random() and cast as Action() object!
return Action(random.choice(self.get_all_actions(state)))
def argmax(self, state, normalized=False, **kwargs):
"""Returns the most likely reward"""
raise NotImplementedError
def get_all_actions(self, state, **kwargs):
# Get all actions available from the current state_1, i.e. x_k, and convert to list from set
return list(TrackingProblem.ACTIONS[state[0]])
Could you try making the actions returned by TrackingProblem.ACTIONS[state[0]]
to be a list of Actions instead of a list of ints? The POUCT code is trying to create a tree node with all actions as the children. It is calling the get_all_actions
function to do that.
Ok, I think I've moved past the Policy Model issues, and have my State, Action domain set up properly. Now I'm getting an error saying something is wrong with my transition model sample method I think is what this one is saying:
File "pomdp_py/algorithms/pomcp.pyx", line 92, in pomdp_py.algorithms.pomcp.POMCP.plan
File "pomdp_py/algorithms/po_uct.pyx", line 252, in pomdp_py.algorithms.po_uct.POUCT.plan
File "pomdp_py/algorithms/po_uct.pyx", line 315, in pomdp_py.algorithms.po_uct.POUCT._search
File "pomdp_py/algorithms/pomcp.pyx", line 129, in pomdp_py.algorithms.pomcp.POMCP._simulate
File "pomdp_py/algorithms/po_uct.pyx", line 343, in pomdp_py.algorithms.po_uct.POUCT._simulate
File "pomdp_py/algorithms/po_uct.pyx", line 377, in pomdp_py.algorithms.po_uct.POUCT._rollout
File "pomdp_py/framework/basics.pyx", line 612, in pomdp_py.framework.basics.sample_generative_model
File "pomdp_py/framework/basics.pyx", line 661, in pomdp_py.framework.basics.sample_explict_models
File "pomdp_py/framework/basics.pyx", line 132, in pomdp_py.framework.basics.TransitionModel.sample
NotImplementedError
Do the method names in the TransitionModel class have to be the exact same or can I change their names? That may have broken this class. The base Agent class must have expected .sample method. Instead of .sample I have sample_j2, and instead of probability I use probability_j1, probability_j2 to handle the transitions of each state individually.
I'm sure you know in Python you call a function by giving it the exact name of the function. The sample method is part of TransitionModel's definition. You can implement it however you want, by breaking it down into what you described. But ultimately you're implementing the sample function.
I did redefine the names and use the right names in how I redefined TransitionModel, but I'm saying this broke something in the .pyx files description, and so the planner must expect them to be named .sample. Not familiar with conversions between c and Python and what that entails. But I'll try to stick with the same naming conventions until I understand the framework better.
Sounds good. I'm still unsure about your motivation of not using the function names defined in the framework.
This might help you see why sample
is needed. The sample_generative_model
function in basics.pyx
uses the sample
function to simulate the next state.
Thanks for the suggestion. In terms of understanding the solvers do recommend reading the papers in any order?
Ok, I think I've resolved the sample issues, and I see from the basics.pyx file how things need to be named a certain way in my own classes even though it's possible to add extra features; Certain base features need to stay the same I'm starting to see because of the solver definition which I haven't got to understanding yet. I have some new questions about the Histogram representation. What is the idea or purpose behind this class? I fixed some bugs in my RewardModel. I think that was easy to fix once I fixed my transition model's sample issues. Now I'm stuck on this observation model issue. I got this new error about representing my pmf as an array instead of a Histogram:
line 49, in get_distribution
return pomdp_py.Histogram(self._probs[j_2,:])
File "pomdp_py/representations/distribution/histogram.pyx", line 21, in pomdp_py.representations.distribution.histogram.Histogram.__init__
ValueError: Unsupported histogram representation! <class 'numpy.ndarray'>
I would like to use numpy arrays for maintaining the belief vectors if that's possible, but I would like to structure it to have the most flexibility.
Histogram is just one belief representation. Technically you could define your own that uses numpy arrays. As long as your belief distribution implements the GenerativeDistribution base class, you can use it as the agent's belief. You can read the POMCP paper. There is also a paper for this library that talks about high-level design principles.
Thanks, I read the second paper about the framework; it's what turned me on to this framework over another one in Python. I thought the concepts laid out in that paper were general enough yet clear enough to solve difficult problems. I've read about 2/3rd's of David Silver's 2010 paper on POMCP. I'll finish that one tomorrow, pomdp.org is also a great resource, and I plan to read through their paper list next. Do you have any other paper recommendations for understanding this framework better or some of the solvers/techniques?
I'm finally able to have it print the initial belief and get past the first action planning step, so I think I almost have pomcp set up! :) I'm getting the following error in the loop running the test pomcp functionality based on rocksample. I have no idea what it means yet:
File "pomdp_py/algorithms/pomcp.pyx", line 94, in pomdp_py.algorithms.pomcp.POMCP.update
File "pomdp_py/algorithms/pomcp.pyx", line 112, in pomdp_py.algorithms.pomcp.POMCP.update
ValueError: Particle deprivation.
A lot of progress! The POMCP paper mentions particle deprivation in section 3.2. Regarding references, I think it depends on what you plan to do with the POMDP library. If you're new to POMDPs, then definitely start by reading this '98 paper. I think pomdp.org is a bit old. There's an OO-POMDP framework in pomdp_py as well and that comes from this paper.
Thanks for trying this out! Feel free to contribute something as you get more into it.
Awesome I'll read that 98 paper next then. After finishing the Pomcp paper, it gives pointers to the original mcts , and uct papers too. With the particle reinvigoration for pomdp_py what are my options? Or should I try the value iteration or a different solver entirely?
When testing VI: this error came up, File "pomdp_py/algorithms/value_iteration.pyx", line 81, in pomdp_py.algorithms.value_iteration.ValueIteration.plan File "pomdp_py/algorithms/value_iteration.pyx", line 108, in pomdp_py.algorithms.value_iteration.ValueIteration._build_policy_trees File "pomdp_py/framework/basics.pyx", line 444, in pomdp_py.framework.basics.Agent.all_actions.get TypeError: get_all_actions() missing 1 required positional argument: 'state'
my get_all_actions function takes in a state, but the one in basics.pyx must not. Is it safe or a good idea to change the arguments list to functions in .pyx or are those generally not touched?
this get_all_actions function is defined in the PolicyModel class
And I'd love to slowly start contributing to the project over time.
The get_all_actions
in basics.pyx
is defined as:
def get_all_actions(self, *args, **kwargs):
So you can pass in whatever argument you want. I'm not sure why you get that error. I don't think the issue is on the pomdp_py side.
Generally, it's better to not change the framework unless it's really necessary (e.g. there's some design flaw). Because if you change the framework all code that uses it need to be changed.
Feel free to reopen when there's any update.
Hello, I am trying to solve a POMDP that is two dimensional. I am modeling the code after the Tiger problem setup, but there the state was one dimensional. So I thought it would be easy to pass a tuple to create and define my State class as such for two dimensions. But the framework does not allow that. What is the correct way to form a vector for the state when passing to the pomdp instance environment from the main function?