sawcordwell / pymdptoolbox

Markov Decision Process (MDP) Toolbox for Python
BSD 3-Clause "New" or "Revised" License
518 stars 252 forks source link

Errors with sparse transition and/or reward matrices #7

Closed yasserglez closed 9 years ago

yasserglez commented 9 years ago

The following code runs fine with dense matrices, but fails in different ways when sparse matrices are used (i.e. when the lines P = list(map(csr_matrix, P)) and/or R = list(map(csr_matrix, R)) are uncommented):

import mdptoolbox
import numpy as np
from scipy.sparse import csr_matrix

P = [None] * 2
P[0] = np.array([
    [0.0, 0.0, 0.6, 0.4, 0.0],
    [0.0, 0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 1.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 1.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 1.0]
])
P[1] = np.array([
    [0.0, 0.4, 0.0, 0.0, 0.6],
    [0.0, 1.0, 0.0, 0.0, 0.0],
    [0.0, 0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 0.0, 0.0, 1.0],
    [0.0, 0.0, 0.0, 0.0, 1.0]
])

R = [None] * 2
R[0] = np.zeros((5, 5))
R[1] = np.array([
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0]
])

# P = list(map(csr_matrix, P))
# R = list(map(csr_matrix, R))

vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
vi.run()
assert vi.policy[0] == 1

pi = mdptoolbox.mdp.PolicyIteration(P, R, 0.9)
pi.run()
assert pi.policy[0] == 1

With sparse P and dense R:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
  File "mdptoolbox/mdp.py", line 1300, in __init__
    self._boundIter(epsilon)
  File "mdptoolbox/mdp.py", line 1345, in _boundIter
    null, value = self._bellmanOperator()
  File "mdptoolbox/mdp.py", line 232, in _bellmanOperator
    Q[aa] = self.R[aa] + self.discount * self.P[aa].dot(V)
ValueError: setting an array element with a sequence.

With dense P and sparse R:

Traceback (most recent call last):
  File "mdptoolbox/mdp.py", line 279, in _computePR
    if R.ndim == 1:
AttributeError: 'list' object has no attribute 'ndim'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "example.py", line 34, in <module>
    vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
  File "mdptoolbox/mdp.py", line 1288, in __init__
    MDP.__init__(self, transitions, reward, discount, epsilon, max_iter)
  File "mdptoolbox/mdp.py", line 187, in __init__
    self._computePR(transitions, reward)
  File "mdptoolbox/mdp.py", line 291, in _computePR
    for aa in range(self.A))
  File "mdptoolbox/mdp.py", line 291, in <genexpr>
    for aa in range(self.A))
AttributeError: 'NotImplementedType' object has no attribute 'sum'

With sparse P and sparse R:

mdptoolbox/mdp.py:1348: RuntimeWarning: divide by zero encountered in double_scalars
  getSpan(value - Vprev) ) / _math.log(self.discount * k))
Traceback (most recent call last):
  File "example.py", line 34, in <module>
    vi = mdptoolbox.mdp.ValueIteration(P, R, 0.9)
  File "mdptoolbox/mdp.py", line 1300, in __init__
    self._boundIter(epsilon)
  File "mdptoolbox/mdp.py", line 1351, in _boundIter
    self.max_iter = int(_math.ceil(max_iter))
OverflowError: cannot convert float infinity to integer

I'm using Python 3.4.0, scipy 0.14.1, numpy 1.9.1 and mdptoolbox 5af7d2a50c1820aeb590b9bb7ced967b2f4e13c7.

sawcordwell commented 9 years ago

Thanks @yasserglez, sparse reward is something that I remember not working very well and I just am looking at a branch where I tried to fix it in the past but didn't finish. That branches code probably doesn't work how it should anyway. I am at the moment converting these into test cases. I have created a branch sc/fix7 with PR #8 to test the fixes. If you happen to do any work on it then you can create a pull request against that.

yasserglez commented 9 years ago

Thanks @sawcordwell, I'll take a look and see if I can find the problem. I've updated the issue with a smaller example that also happens to fail with the same errors and might be easier to track down.

sawcordwell commented 9 years ago

I expanded the tests to cover all algorithms, the new file is pymdptoolbox/src/tests/test_issue7.py

sawcordwell commented 9 years ago

I can't see an easy way to fix it, and the code that sets up the transitions and rewards is too messy and hard to understand so I've decided to rewrite it. I will probably rebase sc/fix7 a few times

sawcordwell commented 9 years ago

I think I have fixed it. Could you check the sc/fix7 branch to see if you are getting results that you expect?

This excerices has confirmed that the linear programming class is not working properly.

yasserglez commented 9 years ago

Just tested it and seems to be working perfectly. Thank you very much!

sawcordwell commented 9 years ago

great, i'll merge it in