Closed guyer closed 9 months ago
Mat.setValuesCSR() got astronomically slower. Compare petsc4py 3.18.4:
Total time: 0.154681 s
File: /Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py
Function: addAt at line 284
Line # Hits Time Per Hit % Time Line Contents
==============================================================
284 @profile
285 def addAt(self, vector, id1, id2):
286 """
287 Add elements of `vector` to the positions in the matrix corresponding to (`id1`,`id2`)
288
289 >>> L = _PETScMatrixFromShape(rows=3, cols=3, bandwidth=3)
290 >>> L.put([3.,10.,numerix.pi,2.5], [0,0,1,2], [2,1,1,0])
291 >>> L.addAt([1.73,2.2,8.4,3.9,1.23], [1,2,0,0,1], [2,2,0,0,2])
292 >>> print(L)
293 12.300000 10.000000 3.000000
294 --- 3.141593 2.960000
295 2.500000 --- 2.200000
296 """
297 33 348.0 10.5 0.2 self.matrix.assemble(self.matrix.AssemblyType.FLUSH)
298 66 154330.0 2338.3 99.8 self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
299 33 3.0 0.1 0.0 addv=True)
to petsc4py 3.20.1
Total time: 82.0462 s
File: /Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py
Function: addAt at line 284
Line # Hits Time Per Hit % Time Line Contents
==============================================================
284 @profile
285 def addAt(self, vector, id1, id2):
286 """
287 Add elements of `vector` to the positions in the matrix corresponding to (`id1`,`id2`)
288
289 >>> L = _PETScMatrixFromShape(rows=3, cols=3, bandwidth=3)
290 >>> L.put([3.,10.,numerix.pi,2.5], [0,0,1,2], [2,1,1,0])
291 >>> L.addAt([1.73,2.2,8.4,3.9,1.23], [1,2,0,0,1], [2,2,0,0,2])
292 >>> print(L)
293 12.300000 10.000000 3.000000
294 --- 3.141593 2.960000
295 2.500000 --- 2.200000
296 """
297 33 3853.0 116.8 0.0 self.matrix.assemble(self.matrix.AssemblyType.FLUSH)
298 66 82042326.0 1e+06 100.0 self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
299 33 3.0 0.1 0.0 addv=True)
(self._ijv2csr()
did not change)
PETSc matrix preallocation has been commented out since the suite was added to FiPy
matrix.setPreallocationNNZ(bandwidth) # FIXME: ??? None, bandwidth
raises exceptions
Traceback (most recent call last):
:
File "/Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py", line 291, in addAt
self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
File "PETSc/Mat.pyx", line 1019, in petsc4py.PETSc.Mat.setValuesCSR
File "PETSc/petscmat.pxi", line 1008, in petsc4py.PETSc.matsetvalues_csr
File "PETSc/petscmat.pxi", line 1001, in petsc4py.PETSc.matsetvalues_ijv
petsc4py.PETSc.Error: error code 63
[0] MatSetValues() at /Users/runner/miniforge3/conda-bld/petsc_1675165805075/work/src/mat/interface/matrix.c:1474
[0] MatSetValues_SeqAIJ() at /Users/runner/miniforge3/conda-bld/petsc_1675165805075/work/src/mat/impls/aij/seq/aij.c:450
[0] Argument out of range
[0] New nonzero at (0,0) caused a malloc
Use MatSetOption(A, MAT_NEW_NONZERO_ALLOCATION_ERR, PETSC_FALSE) to turn off this check
matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)
leads to very slow tests
setPreallocationNNZ()
didn't work because it's the number of non-zeros in the matrix, not per row (like Trilinos).
setPreallocationNNZ()
sets the number of non-zeros per row, but setting it to zero is "bad".
Making a simplistic attempt at preallocation
@@ -482,7 +482,7 @@ class _PETScMatrix(_SparseMatrix):
class _PETScMatrixFromShape(_PETScMatrix):
- def __init__(self, rows, cols, bandwidth=0, sizeHint=None, matrix=None, comm=PETSc.COMM_SELF):
+ def __init__(self, rows, cols, bandwidth=1, sizeHint=None, matrix=None, comm=PETSc.COMM_SELF):
"""Instantiates and wraps a PETSc `Mat` matrix
Parameters
@@ -510,13 +510,14 @@ class _PETScMatrixFromShape(_PETScMatrix):
matrix.setSizes([[rows, None], [cols, None]])
matrix.setType('aij') # sparse
matrix.setUp()
-# matrix.setPreallocationNNZ(bandwidth) # FIXME: ??? None, bandwidth
-# matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)
+ if bandwidth > 0:
+ matrix.setPreallocationNNZ(bandwidth)
+ matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)
combined with setting bandwidth=1
in all subclasses leads to tests running:
master
).Comparison of the runtimes for FiPy's test suite vs petsc4py version, including case of naïve preallocation (these are only the 40 slowest tests when using petsc4py 3.20.1):
Ratio of runtimes for FiPy's test suite between petsc4py versions, when using naïve preallocation
Overall, about the same, and worst cases are within a factor of 4.
Turning on MAT_NEW_NONZERO_ALLOCATION_ERR
produces 34 failures in the test suite. Better preallocation might recover more of the slowdown, but this calls for a deeper look at bandwidth=
and sizeHint=
.
Breakages with PETSc observed in #946 correspond to PETSc 3.20.0, (released 2023-09-28, conda-forge petsc4py feedstock updated 2023-10-04). Previous PETSc 3.18.4 worked fine.
[x] Really slow Comparison of the runtimes for FiPy's test suite vs petsc4py version (these are only the 40 slowest tests when using petsc4py 3.20.1):
Ratio of runtimes for FiPy's test suite between petsc4py versions