usnistgov / fipy

FiPy is a Finite Volume PDE solver written in Python
http://pages.nist.gov/fipy/en/latest
Other
504 stars 148 forks source link

PETSc 3.20.0 broke the world #963

Closed guyer closed 9 months ago

guyer commented 11 months ago

Breakages with PETSc observed in #946 correspond to PETSc 3.20.0, (released 2023-09-28, conda-forge petsc4py feedstock updated 2023-10-04). Previous PETSc 3.18.4 worked fine.

guyer commented 9 months ago

Mat.setValuesCSR() got astronomically slower. Compare petsc4py 3.18.4:

Total time: 0.154681 s
File: /Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py
Function: addAt at line 284

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   284                                               @profile
   285                                               def addAt(self, vector, id1, id2):
   286                                                   """
   287                                                   Add elements of `vector` to the positions in the matrix corresponding to (`id1`,`id2`)
   288                                           
   289                                                       >>> L = _PETScMatrixFromShape(rows=3, cols=3, bandwidth=3)
   290                                                       >>> L.put([3.,10.,numerix.pi,2.5], [0,0,1,2], [2,1,1,0])
   291                                                       >>> L.addAt([1.73,2.2,8.4,3.9,1.23], [1,2,0,0,1], [2,2,0,0,2])
   292                                                       >>> print(L)
   293                                                       12.300000  10.000000   3.000000  
   294                                                           ---     3.141593   2.960000  
   295                                                        2.500000      ---     2.200000  
   296                                                   """
   297        33        348.0     10.5      0.2          self.matrix.assemble(self.matrix.AssemblyType.FLUSH)
   298        66     154330.0   2338.3     99.8          self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
   299        33          3.0      0.1      0.0                                   addv=True)

to petsc4py 3.20.1

Total time: 82.0462 s
File: /Users/guyer/Documents/research/FiPy/fipy/fipy/matrices/petscMatrix.py
Function: addAt at line 284

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   284                                               @profile
   285                                               def addAt(self, vector, id1, id2):
   286                                                   """
   287                                                   Add elements of `vector` to the positions in the matrix corresponding to (`id1`,`id2`)
   288                                           
   289                                                       >>> L = _PETScMatrixFromShape(rows=3, cols=3, bandwidth=3)
   290                                                       >>> L.put([3.,10.,numerix.pi,2.5], [0,0,1,2], [2,1,1,0])
   291                                                       >>> L.addAt([1.73,2.2,8.4,3.9,1.23], [1,2,0,0,1], [2,2,0,0,2])
   292                                                       >>> print(L)
   293                                                       12.300000  10.000000   3.000000  
   294                                                           ---     3.141593   2.960000  
   295                                                        2.500000      ---     2.200000  
   296                                                   """
   297        33       3853.0    116.8      0.0          self.matrix.assemble(self.matrix.AssemblyType.FLUSH)
   298        66   82042326.0    1e+06    100.0          self.matrix.setValuesCSR(*self._ijv2csr(id2, id1, vector),
   299        33          3.0      0.1      0.0                                   addv=True)

(self._ijv2csr() did not change)

guyer commented 9 months ago

PETSc 3.19: "MatSetValues() and friends will now provide reasonable performance when no preallocation information is provided" is probably the culprit.

guyer commented 9 months ago

PETSc matrix preallocation has been commented out since the suite was added to FiPy

guyer commented 9 months ago

setPreallocationNNZ() didn't work because it's the number of non-zeros in the matrix, not per row (like Trilinos).

setPreallocationNNZ() sets the number of non-zeros per row, but setting it to zero is "bad".

Making a simplistic attempt at preallocation

@@ -482,7 +482,7 @@ class _PETScMatrix(_SparseMatrix):

 class _PETScMatrixFromShape(_PETScMatrix):

-    def __init__(self, rows, cols, bandwidth=0, sizeHint=None, matrix=None, comm=PETSc.COMM_SELF):
+    def __init__(self, rows, cols, bandwidth=1, sizeHint=None, matrix=None, comm=PETSc.COMM_SELF):
         """Instantiates and wraps a PETSc `Mat` matrix

         Parameters
@@ -510,13 +510,14 @@ class _PETScMatrixFromShape(_PETScMatrix):
             matrix.setSizes([[rows, None], [cols, None]])
             matrix.setType('aij') # sparse
             matrix.setUp()
-#             matrix.setPreallocationNNZ(bandwidth) # FIXME: ??? None, bandwidth
-#             matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)
+            if bandwidth > 0:
+                matrix.setPreallocationNNZ(bandwidth)
+                matrix.setOption(matrix.Option.NEW_NONZERO_ALLOCATION_ERR, False)

combined with setting bandwidth=1 in all subclasses leads to tests running:

guyer commented 9 months ago

Comparison of the runtimes for FiPy's test suite vs petsc4py version, including case of naïve preallocation (these are only the 40 slowest tests when using petsc4py 3.20.1): Comparison of runtimes vs petsc4py version for 40 slowest test runtimes with petsc4py 3.20.1

Ratio of runtimes for FiPy's test suite between petsc4py versions, when using naïve preallocation Ratio of runtimes for FiPy's test suite between petsc4py versions, when using naïve preallocation

Overall, about the same, and worst cases are within a factor of 4.

guyer commented 9 months ago

Turning on MAT_NEW_NONZERO_ALLOCATION_ERR produces 34 failures in the test suite. Better preallocation might recover more of the slowdown, but this calls for a deeper look at bandwidth= and sizeHint=.