I added a simple 4x4 gridworld example that can be solved with discount = 1.0. Some of the implemented algorithms seem to fail when trying to solve this problem. Below is the relevant output of the tests. Please, let me know if I can help finding the problems.
tests.test_PolicyIteration.test_PolicyIteration_gridworld ... ERROR
tests.test_PolicyIterationModified.TestPolicyIterationModified.test_gridworld ... ok
tests.test_QLearning.test_QLearning_gridworld ... FAIL
tests.test_ValueIteration.test_ValueIteration_gridworld ... ok
tests.test_ValueIterationGS.test_ValueIterationGS_gridworld ... ok
======================================================================
ERROR: tests.test_PolicyIteration.test_PolicyIteration_gridworld
----------------------------------------------------------------------
Traceback (most recent call last):
File "nose/case.py", line 198, in runTest
self.test(*self.arg)
File "pymdptoolbox/src/tests/test_PolicyIteration.py", line 129, in test_PolicyIteration_gridworld
pi.run()
File "pymdptoolbox/src/mdptoolbox/mdp.py", line 810, in run
self._evalPolicyMatrix()
File "pymdptoolbox/src/mdptoolbox/mdp.py", line 799, in _evalPolicyMatrix
(_sp.eye(self.S, self.S) - self.discount * Ppolicy), Rpolicy)
File "numpy/linalg/linalg.py", line 381, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
File "numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
nose.proxy.LinAlgError: Singular matrix
-------------------- >> begin captured stdout << ---------------------
WARNING: check conditions of convergence. With no discount, convergence can not be assumed.
--------------------- >> end captured stdout << ----------------------
======================================================================
FAIL: tests.test_QLearning.test_QLearning_gridworld
----------------------------------------------------------------------
Traceback (most recent call last):
File "nose/case.py", line 198, in runTest
self.test(*self.arg)
File "pymdptoolbox/src/tests/test_QLearning.py", line 63, in test_QLearning_gridworld
assert qlearning.policy == policy_gridworld
AssertionError
----------------------------------------------------------------------
Ran 128 tests in 5.723s
FAILED (errors=1, failures=1)
This pull request is related with issue #6.
I added a simple 4x4 gridworld example that can be solved with
discount = 1.0
. Some of the implemented algorithms seem to fail when trying to solve this problem. Below is the relevant output of the tests. Please, let me know if I can help finding the problems.