In the frozen lake problem, I am getting an assertion about the invalid action taken while finding out the state values. It's using the mdp.py file to get the next state and possible action from a given state. The error indicates that: AssertionError: cannot do action left from the state (0, 0) implies it can't take action 0 from the state (0,0), but when I printed mdp.get_possible_actions((0,0)), it gave me an output: ('left', 'down', 'right', 'up'). The transition probability is zero for this transition, but I can't take this as a condition as well because it again raises the assertion error. My other functions are correct as I get the correct results for the above MDP. Can anyone please tell me what needs to be done, or it's the issue in the frozen lake problem itself?
I have attached the log as well:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-49-2df897c74778>](https://localhost:8080/#) in <module>()
----> 1 state_values = value_iteration(mdp)
4 frames
[<ipython-input-41-b1cacaad4da4>](https://localhost:8080/#) in value_iteration(mdp, state_values, gamma, num_iter, min_difference)
7 new_state_values = {}
8 for state in mdp.get_all_states():
----> 9 new_state_values[state] = get_new_state_value(mdp, state_values, state, gamma)
10
11 assert isinstance(new_state_values, dict)
[<ipython-input-48-7fb51cc228af>](https://localhost:8080/#) in get_new_state_value(mdp, state_values, state, gamma)
5 state_value = -np.inf
6 for action in mdp.get_possible_actions(state):
----> 7 state_value = max(state_value, get_action_value(mdp, state_values, state, action, gamma))
8 print(state_value)
9 return state_value
[<ipython-input-46-4570b47c3b7c>](https://localhost:8080/#) in get_action_value(mdp, state_values, state, action, gamma)
5 qsa = 0
6 for s in mdp.get_next_states(state, action):
----> 7 print(mdp.get_transition_prob(str(state), action, str(s)))
8 if mdp.get_transition_prob(str(state), action, str(s)):
9 qsa += mdp.get_transition_prob(str(state), action, str(s)) * (mdp.get_reward(str(state), action, str(s)) + gamma*state_values[s])
[/content/mdp.py](https://localhost:8080/#) in get_transition_prob(self, state, action, next_state)
76 def get_transition_prob(self, state, action, next_state):
77 """ return P(next_state | state, action) """
---> 78 return self.get_next_states(state, action).get(next_state, 0.0)
79
80 def get_reward(self, state, action, next_state):
[/content/mdp.py](https://localhost:8080/#) in get_next_states(self, state, action)
71 def get_next_states(self, state, action):
72 """ return a dictionary of {next_state1 : P(next_state1 | state, action), next_state2: ...} """
---> 73 assert action in self.get_possible_actions(state), "cannot do action %s from state %s" % (action, state)
74 return self._transition_probs[state][action]
75
AssertionError: cannot do action left from state (0, 0)
In the frozen lake problem, I am getting an assertion about the invalid action taken while finding out the state values. It's using the
mdp.py
file to get the next state and possible action from a given state. The error indicates that:AssertionError: cannot do action left from the state (0, 0)
implies it can't take action 0 from the state(0,0)
, but when I printedmdp.get_possible_actions((0,0))
, it gave me an output:('left', 'down', 'right', 'up')
. The transition probability is zero for this transition, but I can't take this as a condition as well because it again raises the assertion error. My other functions are correct as I get the correct results for the above MDP. Can anyone please tell me what needs to be done, or it's the issue in the frozen lake problem itself? I have attached the log as well: