Closed tryabin closed 7 years ago
Thanks for the report. I think there's a few things going on here:
csrsv_solve
deals with solution to triangular matrices) but treated as-is on the CPU side.matdescr
http://pyculib.readthedocs.io/en/latest/cusparse.html for example descr = handle.matdescr(0, 'N', 'U', 'G')
, the 'U' overrides the default 'L' (lower) storage. In the code presented only the lower triangle is considered by CUDA whilst doing the solve.This demonstrates what's going on:
import numpy as np
import scipy.sparse.linalg
import pyculib
handle = pyculib.sparse.Sparse()
dtype = np.float32
m = n = 3
trans = 'N'
# Initialize the CSR matrix on the host and GPU.
row = np.array([0, 0, 0, 1, 1, 2])
col = np.array([0, 1, 2, 0, 1, 2])
data = np.array([0.431663, 0.955176, 0.925239, 0.0283651, 0.569277, 0.48015], dtype=dtype)
csrMatrixCpu = scipy.sparse.csr_matrix((data, (row, col)), shape=(m, n))
csrMatrixGpu = pyculib.sparse.csr_matrix((data, (row, col)), shape=(m, n))
print(csrMatrixCpu)
print(csrMatrixCpu.todense())
# Perform the analysis step on the GPU.
nnz = csrMatrixGpu.nnz
csrVal = csrMatrixGpu.data
csrRowPtr = csrMatrixGpu.indptr
csrColInd = csrMatrixGpu.indices
descr = handle.matdescr()
info = handle.csrsv_analysis(trans, m, nnz, descr, csrVal, csrRowPtr, csrColInd)
# Initialize the right-hand side of the system.
alpha = 1.0
rightHandSide = np.array([0.48200423, 0.39379725, 0.75963706], dtype=dtype)
gpuResult = np.zeros(m, dtype=dtype)
# Solve the system on the GPU and on the CPU.
handle.csrsv_solve(trans, m, alpha, descr, csrVal, csrRowPtr, csrColInd, info, rightHandSide, gpuResult)
# The system solved on the GPU is not that on the CPU...
# it was declared as lower triangular, so mutate the CPU version to also be lower triangular
csrMatrixCpu = scipy.sparse.csr_matrix(np.tril(csrMatrixCpu.todense()))
print(csrMatrixCpu.todense())
cpuResult = scipy.sparse.linalg.dsolve.spsolve(csrMatrixCpu, rightHandSide, use_umfpack=False)
print('gpu result = ' + str(gpuResult))
print('cpu result = ' + str(cpuResult))
which gives:
$ python issue12b.py
(0, 0) 0.431663
(0, 1) 0.955176
(0, 2) 0.925239
(1, 0) 0.0283651
(1, 1) 0.569277
(2, 2) 0.48015
[[ 0.43166301 0.955176 0.92523903]
[ 0.0283651 0.56927699 0. ]
[ 0. 0. 0.48015001]]
[[ 0.43166301 0. 0. ]
[ 0.0283651 0.56927699 0. ]
[ 0. 0. 0.48015001]]
gpu result = [ 1.11662161 0.63611245 1.58208275]
cpu result = [ 1.11662161 0.63611245 1.58208275]
Further, this shows how to handle an upper triangular system a bit like the one above, using the matdescr
constructor.
import numpy as np
import scipy.sparse.linalg
import pyculib
handle = pyculib.sparse.Sparse()
dtype = np.float32
m = n = 3
trans = 'N'
# Initialize the CSR matrix on the host and GPU.
row = np.array([0, 0, 0, 1, 2])
col = np.array([0, 1, 2, 1, 2])
data = np.array([0.431663, 0.955176, 0.925239, 0.569277, 0.48015], dtype=dtype)
csrMatrixCpu = scipy.sparse.csr_matrix((data, (row, col)), shape=(m, n))
csrMatrixGpu = pyculib.sparse.csr_matrix((data, (row, col)), shape=(m, n))
print(csrMatrixCpu)
print(csrMatrixCpu.todense())
# Perform the analysis step on the GPU.
nnz = csrMatrixGpu.nnz
csrVal = csrMatrixGpu.data
csrRowPtr = csrMatrixGpu.indptr
csrColInd = csrMatrixGpu.indices
descr = handle.matdescr(0, 'N', 'U', 'G')
info = handle.csrsv_analysis(trans, m, nnz, descr, csrVal, csrRowPtr, csrColInd)
# Initialize the right-hand side of the system.
alpha = 1.0
rightHandSide = np.array([0.48200423, 0.39379725, 0.75963706], dtype=dtype)
gpuResult = np.zeros(m, dtype=dtype)
# Solve the system on the GPU and on the CPU.
handle.csrsv_solve(trans, m, alpha, descr, csrVal, csrRowPtr, csrColInd, info, rightHandSide, gpuResult)
cpuResult = scipy.sparse.linalg.dsolve.spsolve(csrMatrixCpu, rightHandSide, use_umfpack=False)
print('gpu result = ' + str(gpuResult))
print('cpu result = ' + str(cpuResult))
which gives:
$ python issue12b.py
(0, 0) 0.431663
(0, 1) 0.955176
(0, 2) 0.925239
(1, 1) 0.569277
(2, 2) 0.48015
[[ 0.43166301 0.955176 0.92523903]
[ 0. 0.56927699 0. ]
[ 0. 0. 0.48015001]]
gpu result = [-3.80515194 0.69174981 1.58208275]
cpu result = [-3.80515194 0.69174981 1.58208275]
Hope this helps.
The matrix was not intended to be triangular. I overlooked the fact that csrsv_solve
only works on triangular matrices.
Is it possible to solve a sparse non-triangular system using Pyculib? I see there is a function for performing Cholesky factorization, but it requires a call to csrsv_analysis
first, which in turn only works for triangular matrices.
It looks like this could be improved with some of the other functions in cuSPARSE, which is what pyculib is delegating to.
IIRC There's no non-triangular sparse system solver supported at present (exception being tridiagonal systems) as pyculib
binds solely to the Sparse BLAS
which at L2/3 only has triangular solvers (and similar) as it is fundamentally a BLAS
.
There's a few things to do for solver support:
cuSOLVER
library.cuSolverDN
.cuSolverSP
.The cudatoolkit
packages in the Numba channel have the library present. Pull requests/contributions are welcome, https://github.com/numba/pyculib/issues/3 is tracking/for discussion.
Thanks, that's what I suspected. However in my use case of real-time simulation a workaround I found was to first factor the matrix on the CPU just once at the start, and then solve the system for different right-hand sides on the GPU by using the pre-computed lower and upper triangular matrices.
I'm trying to solve a simple sparse system using pyculib and am getting the wrong result. Here is the program I have, which also solves the system on the CPU to get the correct result.
The expected result vector is:
[-4.27667809 0.9048419 1.58208275]
but solving the system on the GPU gives the result vector:
[ 1.11662161 0.63611245 1.58208275]