Errors with sparse transition and/or reward matrices

yasserglez commented 9 years ago

The following code runs fine with dense matrices, but fails in different ways when sparse matrices are used (i.e. when the lines P = list(map(csr_matrix, P)) and/or R = list(map(csr_matrix, R)) are uncommented):

import mdptoolbox
import numpy as np
from scipy.sparse import csr_matrix

P = [None] * 2
P[0] = np.array([
    [0.0, 0.0, 0.6, 0.4, 0.0],
    [0.0, 0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 1.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 1.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 1.0]
])
P[1] = np.array([
    [0.0, 0.4, 0.0, 0.0, 0.6],
    [0.0, 1.0, 0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 0.0, 0.0, 1.0]
])

R = [None] * 2
R[0] = np.zeros((5, 5))
R[1] = np.array([
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
])

# P = list(map(csr_matrix, P))
# R = list(map(csr_matrix, R))

vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
vi.run()
assert vi.policy[0] == 1

pi = mdptoolbox.mdp.PolicyIteration(P, R, 0.9)
pi.run()
assert pi.policy[0] == 1

With sparse P and dense R:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
  File "mdptoolbox/mdp.py", line 1300, in __init__
    self._boundIter(epsilon)
  File "mdptoolbox/mdp.py", line 1345, in _boundIter
    null, value = self._bellmanOperator()
  File "mdptoolbox/mdp.py", line 232, in _bellmanOperator
    Q[aa] = self.R[aa] + self.discount * self.P[aa].dot(V)
ValueError: setting an array element with a sequence.

With dense P and sparse R:

Traceback (most recent call last):
  File "mdptoolbox/mdp.py", line 279, in _computePR
    if R.ndim == 1:
AttributeError: 'list' object has no attribute 'ndim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
  File "mdptoolbox/mdp.py", line 1288, in __init__
    MDP.__init__(self, transitions, reward, discount, epsilon, max_iter)
  File "mdptoolbox/mdp.py", line 187, in __init__
    self._computePR(transitions, reward)
  File "mdptoolbox/mdp.py", line 291, in _computePR
    for aa in range(self.A))
  File "mdptoolbox/mdp.py", line 291, in <genexpr>
    for aa in range(self.A))
AttributeError: 'NotImplementedType' object has no attribute 'sum'

With sparse P and sparse R:

mdptoolbox/mdp.py:1348: RuntimeWarning: divide by zero encountered in double_scalars
  getSpan(value - Vprev) ) / _math.log(self.discount * k))
Traceback (most recent call last):
  File "example.py", line 34, in <module>
    vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
  File "mdptoolbox/mdp.py", line 1300, in __init__
    self._boundIter(epsilon)
  File "mdptoolbox/mdp.py", line 1351, in _boundIter
    self.max_iter = int(_math.ceil(max_iter))
OverflowError: cannot convert float infinity to integer

I'm using Python 3.4.0, scipy 0.14.1, numpy 1.9.1 and mdptoolbox 5af7d2a50c1820aeb590b9bb7ced967b2f4e13c7.