h2r / pomdp-py

A framework to build and solve POMDP problems. Documentation: https://h2r.github.io/pomdp-py/
MIT License
216 stars 50 forks source link

MOMDP representation? #67

Open Hororohoruru opened 6 months ago

Hororohoruru commented 6 months ago

Hello,

I would like to know if it is currently possible to create a problem with fully observable state variables and solve them using a .pomdpx file using SARSOP.

zkytony commented 6 months ago

Yes, it is possible. Check out this documentation page.

Regarding fully observable state variables, you can achieve that by having an observation model that simply returns the state.

Hororohoruru commented 6 months ago

Sorry, I did not formulate my question well. What I was meant to ask is how should I formulate my code so, when converted to .pomdpx, it represents some state variables as fully observable.

Right now I am working with a model that has access to the time that has passed since the beginning of the operation, and has a maximum number of time steps to act (finite horizon). The way I have approached is by creating states that, on top of their ID (either an int or 'term' for the terminal state), they have also a property t:

class TDState(pomdp_py.State):
    def __init__(self, state_id, time_step):
        self.id = state_id
        self.t = time_step
        self.name = f"s_{state_id}-t_{time_step}"

The methods for __hash__, __eq__, __str__ and __repr__ are defined similarly to the Tiger example in the documentation. However, observations only have an id property. When the observation depends on t, I retrieve it directly from the TDState object in my ObservationModel:

class TDObservationModel(pomdp.ObservationModel)
    def __init__(self, conf_matrix):
        self. observation_matrix = conf_matrix
        self.n_steps, self.n_states, self.n_obs = self.observation_matrix.shape

    def probability(self, observation, next_state, action):
        obs_idx = observation.id
        state_idx = next_state.id
        state_step = next_state.t

        return self.observation_matrix[state_step][state_idx][obs_idx]

The transition model includes the parameter t_max in order to generate a list of all possible states considering the maximum t. Explaining in Tiger terms, Action('listen') uses the ObservationModel to provide an observation and the state transitions deterministically such that T(s, a_listen, s') = 1 if s.id == s'.id and s'.t == s't + 1. If s.t == t_max, the state transitions to the terminal state no matter the action selected (and provides the corresponding terminal observation deterministically). If any action other than listen is selected, the model also transitions to the terminal state. As in the Tiger problem, the state states the same (here the s.id), but the time advances until an horizon t_max.

I would like the time to be fully observable in the produced .pomdpx file, but since you commented:

Regarding fully observable state variables, you can achieve that by having an observation model that simply returns the state.

I think the way I am handling it would not accomplish the MOMDP representation. How should I do it instead?

Hororohoruru commented 6 months ago

Follow-up: I tried to convert to .pomdpx with my current problem definition and the file reflects only one state variable, which has a number of states equal to the number of possible state IDs times the possible values of t. In the case of 5 targets and 8 time-steps, I get 41 states of a single state variable (the extra state is the terminal state).

I would like to know how to define my model to have a state variable with 5 values (ID), which is not fully observable, and another state variable with 8 values (time), which would be fully observable.

zkytony commented 6 months ago

I will provide a sketch for the idea.

class State(pomdp_py.SimpleState)
    def __init__(self, target, time_step):
        super().__init__(data=(target, time_step))

class ObservationModel(pomdp_py.ObservationModel):
   def sample(self, next_state, action):
       time_step = next_state.data[1]
       return pomdp_py.SimpleObservation(data=time_step)

This makes time_step observable, but not the target.

Hororohoruru commented 6 months ago

That makes it clear, thank you! I imagine the ObservationModel need to know t_max in order to create a list of all the possible observations.

What I mean is that the target needs to be part of the observation as well.