Closed YaqiWang closed 3 months ago
ParMETIS is our current default partitioner in parallel runs
Are you calling Parmetis via a PetscExternalPartitioner
? If so then I'd not call that a "default"; it's something that has to be manually specified in the input file, right? Parmetis is the default when a DistributedMesh
is partitioned, but when that's the case it's called via the libMesh interface, and the results there are perhaps suboptimal (they seem to depend a bit too much on the initial setup, which makes me think we should be defaulting to a space-filling curve rather than element numbering for the initial iterate...), but they're not the utter garbage we're seeing in that 8-subdomain PetscExternalPartitioner
test case.
The trouble with the PetscExternalPartitioner
case is ... I honestly don't see what we should or could be doing differently in MOOSE. PETSc provides us with this nice partitioner-independent API, and if we pass "ptscotch" as a string to that API then we get an excellent partitioning whereas if we pass "parmetis" then we get garbage. Maybe there's some setup that we should be doing that Parmetis depends on but that party/chaco/ptscotch all ignore?
@YaqiWang, so we fixed this in the framework by adding a switch to ptscotch in cases where the mesh is too small for parmetis to perform. I have trouble with a Griffin test griffin/radiation_transport/test/tests/iqs/twigl_multischeme/twigl_step.i
and it seems to actually default to the Libmesh Parmetis partitioner, not the Petsc one. Do you mind if I change the mesh block in the test to avoid any mix up with the PETSc one
[Mesh]
[cmg]
type = CartesianMeshGenerator
dim = 2
dx = '30 40 30'
dy = '30 40 30'
ix = '1 1 1'
iy = '1 1 1'
subdomain_id =
'1 2 1
2 3 1
1 1 1'
[]
[Partitioner]
type = LibmeshPartitioner
partitioner = parmetis
[]
[]
Reason
As showed in https://mooseframework.inl.gov/source/partitioner/PetscExternalPartitioner.html, where a 10-by-10 regular grid is partitioned into 8 subdomains and ParMETIS gives discontinuous subdomains. Reported by Jan Vermaak and @jthano that this is not typically seen with their codes. This bad partitioning can hurt the performance, which often may not be detected by users. ParMETIS is our current default partitioner in parallel runs, so we need to look into this.
Design
@roystgnr mentioned that calling ParMETIS directly with libMesh for that 10-by-10 regular gird did not produce this broken partitioning, which indicates that this is likely caused by how we call ParMETIS in our MOOSE workflow. A separate point is that If ParMETIS is a super set of METIS, we might want to replace METIS even in serial runs. Tag @lindsayad
Impact
Should not affect the converged solutions but can improve the performance for calculations with bad partitionings.