Tests for undiscounted MDPs

This pull request is related with issue #6.

I added a simple 4x4 gridworld example that can be solved with discount = 1.0. Some of the implemented algorithms seem to fail when trying to solve this problem. Below is the relevant output of the tests. Please, let me know if I can help finding the problems.

tests.test_PolicyIteration.test_PolicyIteration_gridworld ... ERROR
tests.test_PolicyIterationModified.TestPolicyIterationModified.test_gridworld ... ok
tests.test_QLearning.test_QLearning_gridworld ... FAIL
tests.test_ValueIteration.test_ValueIteration_gridworld ... ok
tests.test_ValueIterationGS.test_ValueIterationGS_gridworld ... ok

======================================================================
ERROR: tests.test_PolicyIteration.test_PolicyIteration_gridworld
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "pymdptoolbox/src/tests/test_PolicyIteration.py", line 129, in test_PolicyIteration_gridworld
    pi.run()
  File "pymdptoolbox/src/mdptoolbox/mdp.py", line 810, in run
    self._evalPolicyMatrix()
  File "pymdptoolbox/src/mdptoolbox/mdp.py", line 799, in _evalPolicyMatrix
    (_sp.eye(self.S, self.S) - self.discount * Ppolicy), Rpolicy)
  File "numpy/linalg/linalg.py", line 381, in solve
    r = gufunc(a, b, signature=signature, extobj=extobj)
  File "numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
    raise LinAlgError("Singular matrix")
nose.proxy.LinAlgError: Singular matrix
-------------------- >> begin captured stdout << ---------------------
WARNING: check conditions of convergence. With no discount, convergence can not be assumed.

--------------------- >> end captured stdout << ----------------------

======================================================================
FAIL: tests.test_QLearning.test_QLearning_gridworld
----------------------------------------------------------------------
Traceback (most recent call last):
  File "nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "pymdptoolbox/src/tests/test_QLearning.py", line 63, in test_QLearning_gridworld
    assert qlearning.policy == policy_gridworld
AssertionError

----------------------------------------------------------------------
Ran 128 tests in 5.723s

FAILED (errors=1, failures=1)

sawcordwell / pymdptoolbox

Tests for undiscounted MDPs #15