Closed Hororohoruru closed 10 months ago
Hi, in this case, I recommend forking the repo and develop the feature as a subclass of POMCP or POUCT in a separate file, instead of directly changing the source code for them. If you'd like, feel free to open a PR then. It is unclear to me how broadly useful this is and how someone else could use this for their problem. If you make a PR, would be great to include one or more domains (in pomdp_problems
) to illustrate its appeal. Thank you for considering contributing!
Thank you very much! Would this new feature and subclass need to follow the cython syntax? If yes, I would like to know if you could point me to some documentation / guidelines since I've never worked with Cython personally.
Ideally yes, but you should be able to inherit and implement your subclasses in Python. I mostly learned Cython through reading their documentation or other websites. Also, let's not worry about adding this into pomdp-py for now. I hope you can focus on solving your problem.
Sounds good to me! Thank you for your help. I'm closing this for the moment. Will re-open if there is anything else to discuss.
I started to implement a class that inherits from POMCP as you suggested:
class POMCP2(POMCP):
"""
Implements some methods on top of the POMCP implementation.
Attributes
----------
_exploration_const_list: list or None, default=None
List of exploration constants to be used for each depth of the tree.
If None, all methods use the original POMCP implementation
"""
def __init__(self,
max_depth=5, planning_time=-1., num_sims=-1,
discount_factor=0.9, exploration_const=math.sqrt(2),
num_visits_init=0, value_init=0,
rollout_policy=None, action_prior=None,
show_progress=False, pbar_update_interval=5,
exploration_const_list=None):
super().__init__(max_depth=max_depth,
planning_time=planning_time,
num_sims=num_sims,
discount_factor=discount_factor,
exploration_const=exploration_const,
num_visits_init=num_visits_init,
value_init=value_init,
rollout_policy=rollout_policy,
action_prior=action_prior,
show_progress=show_progress,
pbar_update_interval=pbar_update_interval)
self._exploration_const_list = exploration_const_list
def _simulate(self,
state, history, root, parent,
observation, depth):
if self._exploration_const_list is not None:
breakpoint()
if depth > self._max_depth:
return 0
if root is None:
if self._agent.tree is None:
root = self._VNode(agent=self._agent, root=True)
self._agent.tree = root
if self._agent.tree.history != self._agent.history:
raise ValueError("Unable to plan for the given history.")
else:
root = self._VNode()
if parent is not None:
parent[observation] = root
self._expand_vnode(root, history, state=state)
rollout_reward = self._rollout(state, history, root, depth)
return rollout_reward
action = self._ucb(root, state)
next_state, observation, reward, nsteps = sample_generative_model(self._agent, state, action)
if nsteps == 0:
# This indicates the provided action didn't lead to transition
# Perhaps the action is not allowed to be performed for the given state
# (for example, the state is not in the initiation set of the option,
# or the state is a terminal state)
return reward
total_reward = reward + (self._discount_factor**nsteps)*self._simulate(next_state,
history + ((action, observation),),
root[action][observation],
root[action],
observation,
depth+nsteps)
root.num_visits += 1
root[action].num_visits += 1
root[action].value = root[action].value + (total_reward - root[action].value) / (root[action].num_visits)
return total_reward
# POMCP belief update
if depth == 1 and root is not None:
root.belief.add(state)
else:
return super()._simulate(self, state, history, root, parent, observation, depth)
It is very much identical to POUCT
's _simulate()
method, with the only difference being that the function call for self._ucb
includes state
. However, I am getting the following error:
Traceback (most recent call last):
File "ssvep_exp.py", line 491, in <module>
exp.main()
File "/home/jtorretr/gitrepos/phd_pomdp/bci_pomdp/experiment.py", line 1186, in main
action = policy.plan(bci_problem.agent)
File "pomdp_py/algorithms/pomcp.pyx", line 94, in pomdp_py.algorithms.pomcp.POMCP.plan
File "pomdp_py/algorithms/po_uct.pyx", line 233, in pomdp_py.algorithms.po_uct.POUCT.plan
File "pomdp_py/algorithms/po_uct.pyx", line 306, in pomdp_py.algorithms.po_uct.POUCT._search
File "pomdp_py/algorithms/pomcp.pyx", line 128, in pomdp_py.algorithms.pomcp.POMCP._simulate
File "/home/jtorretr/gitrepos/phd_pomdp/bci_pomdp/planner/POMCP2.py", line 52, in _simulate
if depth > self._max_depth:
AttributeError: 'POMCP2' object has no attribute '_max_depth'
But the attribute should be correctly assigned from the different __init__()
methods. Any idea why this is happening?
This happens because _max_depth
is a private member of POUCT (you can't access it through Python; you could if you're coding the subclass in Cython). My suggestion for now is, fork the pomdp-py repo, add property getters for private members you need (similar to this), and then build your Python class by inheritance.
Thank you for your answer. I did what you suggested but the error persists... here is my code for POUCT
on the fork:
cdef class POUCT(Planner):
""" POUCT (Partially Observable UCT) :cite:`silver2010monte` is presented in the POMCP
paper as an extension of the UCT algorithm to partially-observable domains
that combines MCTS and UCB1 for action selection.
POUCT only works for problems with action space that can be enumerated.
__init__(self,
max_depth=5, planning_time=1., num_sims=-1,
discount_factor=0.9, exploration_const=math.sqrt(2),
num_visits_init=1, value_init=0,
rollout_policy=RandomRollout(),
action_prior=None, show_progress=False, pbar_update_interval=5)
Args:
max_depth (int): Depth of the MCTS tree. Default: 5.
planning_time (float), amount of time given to each planning step (seconds). Default: -1.
if negative, then planning terminates when number of simulations `num_sims` reached.
If both `num_sims` and `planning_time` are negative, then the planner will run for 1 second.
num_sims (int): Number of simulations for each planning step. If negative,
then will terminate when planning_time is reached.
If both `num_sims` and `planning_time` are negative, then the planner will run for 1 second.
rollout_policy (RolloutPolicy): rollout policy. Default: RandomRollout.
action_prior (ActionPrior): a prior over preferred actions given state and history.
show_progress (bool): True if print a progress bar for simulations.
pbar_update_interval (int): The number of simulations to run after each update of the progress bar,
Only useful if show_progress is True; You can set this parameter even if your stopping criteria
is time.
"""
def __init__(self,
max_depth=5, planning_time=-1., num_sims=-1,
discount_factor=0.9, exploration_const=math.sqrt(2),
num_visits_init=0, value_init=0,
rollout_policy=RandomRollout(),
action_prior=None, show_progress=False, pbar_update_interval=5):
self._max_depth = max_depth
self._planning_time = planning_time
self._num_sims = num_sims
if self._num_sims < 0 and self._planning_time < 0:
self._planning_time = 1.
self._num_visits_init = num_visits_init
self._value_init = value_init
self._rollout_policy = rollout_policy
self._discount_factor = discount_factor
self._exploration_const = exploration_const
self._action_prior = action_prior
self._show_progress = show_progress
self._pbar_update_interval = pbar_update_interval
# to simplify function calls; plan only for one agent at a time
self._agent = None
self._last_num_sims = -1
self._last_planning_time = -1
@property
def updates_agent_belief(self):
return False
@property
def last_num_sims(self):
"""Returns the number of simulations ran for the last `plan` call."""
return self._last_num_sims
@property
def last_planning_time(self):
"""Returns the amount of time (seconds) ran for the last `plan` call."""
return self._last_planning_time
@property
def max_depth(self):
"""Returns the maximum depth of the MCTS tree."""
return self._max_depth
@property
def agent(self):
"""Returns the agent that the planner is planning for."""
return self._agent
I also checked whether self.max_depth
existed (without the underscore), but no. The attribute cannot be accessed.
UPDATE: I was wondering whether cython-built classes would reflect changes in the code as Python code when installed with the -e flag via pip. I remembered that when I got errors in the past the Traceback mentioned it came from a .so file. I imagine that the C code is compiled when you install the package, so I tried re-installing via pip and now it works. Is there a different way for changes to be applied?
After you modify the Cython files, you need to rebuild. Cython is a compiled language. You can run make build
to rebuild (see installation as developer).
Closing this now. Feel free to reopen if comes up again.
Hello! I want to implement an adapive exploration constant for POMCP as a function of the uncertainty over different states, as described in this paper. For that, I intend to do the following:
exploration_const
as a list instead of a single value_exploration_const
by the index corresponding to the stateWould it be a way of implementing this without modifying the source code? I would prefer to not have to modify the library if possible. Also, I don't know if you would consider this interesting for a pull request, in which case I'd gladly discuss how to implement it in the library.