How to correctly modify a planner object

Hororohoruru commented 1 year ago

Hello! I want to implement an adapive exploration constant for POMCP as a function of the uncertainty over different states, as described in this paper. For that, I intend to do the following:

As the different entropy-based exploration constant parameters depend on the observation model, they can be calculated outside of the planner
There is one value per state, so it can be pased to the POMCP object's exploration_const as a list instead of a single value
In the ucb function, modify the function call to take the state being simulated by the simulation, and index _exploration_const by the index corresponding to the state

Would it be a way of implementing this without modifying the source code? I would prefer to not have to modify the library if possible. Also, I don't know if you would consider this interesting for a pull request, in which case I'd gladly discuss how to implement it in the library.

zkytony commented 1 year ago

Hi, in this case, I recommend forking the repo and develop the feature as a subclass of POMCP or POUCT in a separate file, instead of directly changing the source code for them. If you'd like, feel free to open a PR then. It is unclear to me how broadly useful this is and how someone else could use this for their problem. If you make a PR, would be great to include one or more domains (in pomdp_problems) to illustrate its appeal. Thank you for considering contributing!

Hororohoruru commented 1 year ago

Thank you very much! Would this new feature and subclass need to follow the cython syntax? If yes, I would like to know if you could point me to some documentation / guidelines since I've never worked with Cython personally.

zkytony commented 1 year ago

Ideally yes, but you should be able to inherit and implement your subclasses in Python. I mostly learned Cython through reading their documentation or other websites. Also, let's not worry about adding this into pomdp-py for now. I hope you can focus on solving your problem.

Hororohoruru commented 1 year ago

Sounds good to me! Thank you for your help. I'm closing this for the moment. Will re-open if there is anything else to discuss.

Hororohoruru commented 1 year ago

I started to implement a class that inherits from POMCP as you suggested:

class POMCP2(POMCP):
    """
    Implements some methods on top of the POMCP implementation.

    Attributes
    ----------

    _exploration_const_list: list or None, default=None
        List of exploration constants to be used for each depth of the tree.
        If None, all methods use the original POMCP implementation
    """

    def __init__(self,
                 max_depth=5, planning_time=-1., num_sims=-1,
                 discount_factor=0.9, exploration_const=math.sqrt(2),
                 num_visits_init=0, value_init=0,
                 rollout_policy=None, action_prior=None,
                 show_progress=False, pbar_update_interval=5,
                 exploration_const_list=None):

        super().__init__(max_depth=max_depth,
                         planning_time=planning_time,
                         num_sims=num_sims,
                         discount_factor=discount_factor,
                         exploration_const=exploration_const,
                         num_visits_init=num_visits_init,
                         value_init=value_init,
                         rollout_policy=rollout_policy,
                         action_prior=action_prior,
                         show_progress=show_progress,
                         pbar_update_interval=pbar_update_interval)

        self._exploration_const_list = exploration_const_list

    def _simulate(self,
                  state, history, root, parent,
                  observation, depth):
        if self._exploration_const_list is not None:
            breakpoint()
            if depth > self._max_depth:
                return 0
            if root is None:
                if self._agent.tree is None:
                    root = self._VNode(agent=self._agent, root=True)
                    self._agent.tree = root
                    if self._agent.tree.history != self._agent.history:
                        raise ValueError("Unable to plan for the given history.")
                else:
                    root = self._VNode()
                if parent is not None:
                    parent[observation] = root
                self._expand_vnode(root, history, state=state)
                rollout_reward = self._rollout(state, history, root, depth)
                return rollout_reward
            action = self._ucb(root, state)
            next_state, observation, reward, nsteps = sample_generative_model(self._agent, state, action)

            if nsteps == 0:
                # This indicates the provided action didn't lead to transition
                # Perhaps the action is not allowed to be performed for the given state
                # (for example, the state is not in the initiation set of the option,
                # or the state is a terminal state)
                return reward

            total_reward = reward + (self._discount_factor**nsteps)*self._simulate(next_state,
                                                                                   history + ((action, observation),),
                                                                                   root[action][observation],
                                                                                   root[action],
                                                                                   observation,
                                                                                   depth+nsteps)
            root.num_visits += 1
            root[action].num_visits += 1
            root[action].value = root[action].value + (total_reward - root[action].value) / (root[action].num_visits)
            return total_reward

            # POMCP belief update
            if depth == 1 and root is not None:
                root.belief.add(state)
        else:
            return super()._simulate(self, state, history, root, parent, observation, depth)

It is very much identical to POUCT's _simulate() method, with the only difference being that the function call for self._ucb includes state. However, I am getting the following error:

Traceback (most recent call last):
  File "ssvep_exp.py", line 491, in <module>
    exp.main()
  File "/home/jtorretr/gitrepos/phd_pomdp/bci_pomdp/experiment.py", line 1186, in main
    action = policy.plan(bci_problem.agent)
  File "pomdp_py/algorithms/pomcp.pyx", line 94, in pomdp_py.algorithms.pomcp.POMCP.plan
  File "pomdp_py/algorithms/po_uct.pyx", line 233, in pomdp_py.algorithms.po_uct.POUCT.plan
  File "pomdp_py/algorithms/po_uct.pyx", line 306, in pomdp_py.algorithms.po_uct.POUCT._search
  File "pomdp_py/algorithms/pomcp.pyx", line 128, in pomdp_py.algorithms.pomcp.POMCP._simulate
  File "/home/jtorretr/gitrepos/phd_pomdp/bci_pomdp/planner/POMCP2.py", line 52, in _simulate
    if depth > self._max_depth:
AttributeError: 'POMCP2' object has no attribute '_max_depth'

But the attribute should be correctly assigned from the different __init__() methods. Any idea why this is happening?

zkytony commented 1 year ago

This happens because _max_depth is a private member of POUCT (you can't access it through Python; you could if you're coding the subclass in Cython). My suggestion for now is, fork the pomdp-py repo, add property getters for private members you need (similar to this), and then build your Python class by inheritance.

Hororohoruru commented 1 year ago

Thank you for your answer. I did what you suggested but the error persists... here is my code for POUCT on the fork:

cdef class POUCT(Planner):

    """ POUCT (Partially Observable UCT) :cite:`silver2010monte` is presented in the POMCP
    paper as an extension of the UCT algorithm to partially-observable domains
    that combines MCTS and UCB1 for action selection.

    POUCT only works for problems with action space that can be enumerated.

    __init__(self,
             max_depth=5, planning_time=1., num_sims=-1,
             discount_factor=0.9, exploration_const=math.sqrt(2),
             num_visits_init=1, value_init=0,
             rollout_policy=RandomRollout(),
             action_prior=None, show_progress=False, pbar_update_interval=5)

    Args:
        max_depth (int): Depth of the MCTS tree. Default: 5.
        planning_time (float), amount of time given to each planning step (seconds). Default: -1.
            if negative, then planning terminates when number of simulations `num_sims` reached.
            If both `num_sims` and `planning_time` are negative, then the planner will run for 1 second.
        num_sims (int): Number of simulations for each planning step. If negative,
            then will terminate when planning_time is reached.
            If both `num_sims` and `planning_time` are negative, then the planner will run for 1 second.
        rollout_policy (RolloutPolicy): rollout policy. Default: RandomRollout.
        action_prior (ActionPrior): a prior over preferred actions given state and history.
        show_progress (bool): True if print a progress bar for simulations.
        pbar_update_interval (int): The number of simulations to run after each update of the progress bar,
            Only useful if show_progress is True; You can set this parameter even if your stopping criteria
            is time.
    """

    def __init__(self,
                 max_depth=5, planning_time=-1., num_sims=-1,
                 discount_factor=0.9, exploration_const=math.sqrt(2),
                 num_visits_init=0, value_init=0,
                 rollout_policy=RandomRollout(),
                 action_prior=None, show_progress=False, pbar_update_interval=5):
        self._max_depth = max_depth
        self._planning_time = planning_time
        self._num_sims = num_sims
        if self._num_sims < 0 and self._planning_time < 0:
            self._planning_time = 1.
        self._num_visits_init = num_visits_init
        self._value_init = value_init
        self._rollout_policy = rollout_policy
        self._discount_factor = discount_factor
        self._exploration_const = exploration_const
        self._action_prior = action_prior

        self._show_progress = show_progress
        self._pbar_update_interval = pbar_update_interval

        # to simplify function calls; plan only for one agent at a time
        self._agent = None
        self._last_num_sims = -1
        self._last_planning_time = -1

    @property
    def updates_agent_belief(self):
        return False

    @property
    def last_num_sims(self):
        """Returns the number of simulations ran for the last `plan` call."""
        return self._last_num_sims

    @property
    def last_planning_time(self):
        """Returns the amount of time (seconds) ran for the last `plan` call."""
        return self._last_planning_time

    @property
    def max_depth(self):
        """Returns the maximum depth of the MCTS tree."""
        return self._max_depth

    @property
    def agent(self):
        """Returns the agent that the planner is planning for."""
        return self._agent

I also checked whether self.max_depth existed (without the underscore), but no. The attribute cannot be accessed.

UPDATE: I was wondering whether cython-built classes would reflect changes in the code as Python code when installed with the -e flag via pip. I remembered that when I got errors in the past the Traceback mentioned it came from a .so file. I imagine that the C code is compiled when you install the package, so I tried re-installing via pip and now it works. Is there a different way for changes to be applied?

zkytony commented 1 year ago

After you modify the Cython files, you need to rebuild. Cython is a compiled language. You can run make build to rebuild (see installation as developer).

zkytony commented 10 months ago

Closing this now. Feel free to reopen if comes up again.

h2r / pomdp-py

How to correctly modify a planner object #35