h2r / pomdp-py

A framework to build and solve POMDP problems. Documentation: https://h2r.github.io/pomdp-py/
MIT License
209 stars 49 forks source link

How to correctly modify a planner object #35

Closed Hororohoruru closed 10 months ago

Hororohoruru commented 1 year ago

Hello! I want to implement an adapive exploration constant for POMCP as a function of the uncertainty over different states, as described in this paper. For that, I intend to do the following:

Would it be a way of implementing this without modifying the source code? I would prefer to not have to modify the library if possible. Also, I don't know if you would consider this interesting for a pull request, in which case I'd gladly discuss how to implement it in the library.

zkytony commented 1 year ago

Hi, in this case, I recommend forking the repo and develop the feature as a subclass of POMCP or POUCT in a separate file, instead of directly changing the source code for them. If you'd like, feel free to open a PR then. It is unclear to me how broadly useful this is and how someone else could use this for their problem. If you make a PR, would be great to include one or more domains (in pomdp_problems) to illustrate its appeal. Thank you for considering contributing!

Hororohoruru commented 1 year ago

Thank you very much! Would this new feature and subclass need to follow the cython syntax? If yes, I would like to know if you could point me to some documentation / guidelines since I've never worked with Cython personally.

zkytony commented 1 year ago

Ideally yes, but you should be able to inherit and implement your subclasses in Python. I mostly learned Cython through reading their documentation or other websites. Also, let's not worry about adding this into pomdp-py for now. I hope you can focus on solving your problem.

Hororohoruru commented 1 year ago

Sounds good to me! Thank you for your help. I'm closing this for the moment. Will re-open if there is anything else to discuss.

Hororohoruru commented 1 year ago

I started to implement a class that inherits from POMCP as you suggested:

class POMCP2(POMCP):
    """
    Implements some methods on top of the POMCP implementation.

    Attributes
    ----------

    _exploration_const_list: list or None, default=None
        List of exploration constants to be used for each depth of the tree.
        If None, all methods use the original POMCP implementation
    """

    def __init__(self,
                 max_depth=5, planning_time=-1., num_sims=-1,
                 discount_factor=0.9, exploration_const=math.sqrt(2),
                 num_visits_init=0, value_init=0,
                 rollout_policy=None, action_prior=None,
                 show_progress=False, pbar_update_interval=5,
                 exploration_const_list=None):

        super().__init__(max_depth=max_depth,
                         planning_time=planning_time,
                         num_sims=num_sims,
                         discount_factor=discount_factor,
                         exploration_const=exploration_const,
                         num_visits_init=num_visits_init,
                         value_init=value_init,
                         rollout_policy=rollout_policy,
                         action_prior=action_prior,
                         show_progress=show_progress,
                         pbar_update_interval=pbar_update_interval)

        self._exploration_const_list = exploration_const_list

    def _simulate(self,
                  state, history, root, parent,
                  observation, depth):
        if self._exploration_const_list is not None:
            breakpoint()
            if depth > self._max_depth:
                return 0
            if root is None:
                if self._agent.tree is None:
                    root = self._VNode(agent=self._agent, root=True)
                    self._agent.tree = root
                    if self._agent.tree.history != self._agent.history:
                        raise ValueError("Unable to plan for the given history.")
                else:
                    root = self._VNode()
                if parent is not None:
                    parent[observation] = root
                self._expand_vnode(root, history, state=state)
                rollout_reward = self._rollout(state, history, root, depth)
                return rollout_reward
            action = self._ucb(root, state)
            next_state, observation, reward, nsteps = sample_generative_model(self._agent, state, action)

            if nsteps == 0:
                # This indicates the provided action didn't lead to transition
                # Perhaps the action is not allowed to be performed for the given state
                # (for example, the state is not in the initiation set of the option,
                # or the state is a terminal state)
                return reward

            total_reward = reward + (self._discount_factor**nsteps)*self._simulate(next_state,
                                                                                   history + ((action, observation),),
                                                                                   root[action][observation],
                                                                                   root[action],
                                                                                   observation,
                                                                                   depth+nsteps)
            root.num_visits += 1
            root[action].num_visits += 1
            root[action].value = root[action].value + (total_reward - root[action].value) / (root[action].num_visits)
            return total_reward

            # POMCP belief update
            if depth == 1 and root is not None:
                root.belief.add(state)
        else:
            return super()._simulate(self, state, history, root, parent, observation, depth)

It is very much identical to POUCT's _simulate() method, with the only difference being that the function call for self._ucb includes state. However, I am getting the following error:

Traceback (most recent call last):
  File "ssvep_exp.py", line 491, in <module>
    exp.main()
  File "/home/jtorretr/gitrepos/phd_pomdp/bci_pomdp/experiment.py", line 1186, in main
    action = policy.plan(bci_problem.agent)
  File "pomdp_py/algorithms/pomcp.pyx", line 94, in pomdp_py.algorithms.pomcp.POMCP.plan
  File "pomdp_py/algorithms/po_uct.pyx", line 233, in pomdp_py.algorithms.po_uct.POUCT.plan
  File "pomdp_py/algorithms/po_uct.pyx", line 306, in pomdp_py.algorithms.po_uct.POUCT._search
  File "pomdp_py/algorithms/pomcp.pyx", line 128, in pomdp_py.algorithms.pomcp.POMCP._simulate
  File "/home/jtorretr/gitrepos/phd_pomdp/bci_pomdp/planner/POMCP2.py", line 52, in _simulate
    if depth > self._max_depth:
AttributeError: 'POMCP2' object has no attribute '_max_depth'

But the attribute should be correctly assigned from the different __init__() methods. Any idea why this is happening?

zkytony commented 1 year ago

This happens because _max_depth is a private member of POUCT (you can't access it through Python; you could if you're coding the subclass in Cython). My suggestion for now is, fork the pomdp-py repo, add property getters for private members you need (similar to this), and then build your Python class by inheritance.

Hororohoruru commented 1 year ago

Thank you for your answer. I did what you suggested but the error persists... here is my code for POUCT on the fork:

cdef class POUCT(Planner):

    """ POUCT (Partially Observable UCT) :cite:`silver2010monte` is presented in the POMCP
    paper as an extension of the UCT algorithm to partially-observable domains
    that combines MCTS and UCB1 for action selection.

    POUCT only works for problems with action space that can be enumerated.

    __init__(self,
             max_depth=5, planning_time=1., num_sims=-1,
             discount_factor=0.9, exploration_const=math.sqrt(2),
             num_visits_init=1, value_init=0,
             rollout_policy=RandomRollout(),
             action_prior=None, show_progress=False, pbar_update_interval=5)

    Args:
        max_depth (int): Depth of the MCTS tree. Default: 5.
        planning_time (float), amount of time given to each planning step (seconds). Default: -1.
            if negative, then planning terminates when number of simulations `num_sims` reached.
            If both `num_sims` and `planning_time` are negative, then the planner will run for 1 second.
        num_sims (int): Number of simulations for each planning step. If negative,
            then will terminate when planning_time is reached.
            If both `num_sims` and `planning_time` are negative, then the planner will run for 1 second.
        rollout_policy (RolloutPolicy): rollout policy. Default: RandomRollout.
        action_prior (ActionPrior): a prior over preferred actions given state and history.
        show_progress (bool): True if print a progress bar for simulations.
        pbar_update_interval (int): The number of simulations to run after each update of the progress bar,
            Only useful if show_progress is True; You can set this parameter even if your stopping criteria
            is time.
    """

    def __init__(self,
                 max_depth=5, planning_time=-1., num_sims=-1,
                 discount_factor=0.9, exploration_const=math.sqrt(2),
                 num_visits_init=0, value_init=0,
                 rollout_policy=RandomRollout(),
                 action_prior=None, show_progress=False, pbar_update_interval=5):
        self._max_depth = max_depth
        self._planning_time = planning_time
        self._num_sims = num_sims
        if self._num_sims < 0 and self._planning_time < 0:
            self._planning_time = 1.
        self._num_visits_init = num_visits_init
        self._value_init = value_init
        self._rollout_policy = rollout_policy
        self._discount_factor = discount_factor
        self._exploration_const = exploration_const
        self._action_prior = action_prior

        self._show_progress = show_progress
        self._pbar_update_interval = pbar_update_interval

        # to simplify function calls; plan only for one agent at a time
        self._agent = None
        self._last_num_sims = -1
        self._last_planning_time = -1

    @property
    def updates_agent_belief(self):
        return False

    @property
    def last_num_sims(self):
        """Returns the number of simulations ran for the last `plan` call."""
        return self._last_num_sims

    @property
    def last_planning_time(self):
        """Returns the amount of time (seconds) ran for the last `plan` call."""
        return self._last_planning_time

    @property
    def max_depth(self):
        """Returns the maximum depth of the MCTS tree."""
        return self._max_depth

    @property
    def agent(self):
        """Returns the agent that the planner is planning for."""
        return self._agent

I also checked whether self.max_depth existed (without the underscore), but no. The attribute cannot be accessed.

UPDATE: I was wondering whether cython-built classes would reflect changes in the code as Python code when installed with the -e flag via pip. I remembered that when I got errors in the past the Traceback mentioned it came from a .so file. I imagine that the C code is compiled when you install the package, so I tried re-installing via pip and now it works. Is there a different way for changes to be applied?

zkytony commented 1 year ago

After you modify the Cython files, you need to rebuild. Cython is a compiled language. You can run make build to rebuild (see installation as developer).

zkytony commented 10 months ago

Closing this now. Feel free to reopen if comes up again.