trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 564 forks source link

Zoltan2: RCM ordering might result in ineffective incomplete factorization preconditioner #6675

Closed jhux2 closed 4 years ago

jhux2 commented 4 years ago

@vbrunini has reported at issue when solving a small (268 DOF) steady incompressible Q2/Q1 fluid problem with an additional energy advection diffusion equation. The linear system can be solved with GMRES preconditioned by ILU(0) in AztecOO, but will fail when using the same solver in Belos/Ifpack2.

I've traced the difference in behavior to differences in how AztecOO and Zoltan2 generate the RCM ordering:

1) AztecOO chooses a root node based on the algorithm in George and Liu. Zoltan2's default option is to do two BFS passes. 2) AztecOO uses a stable sort to order "unmarked" neighbors by increasing degree. Zoltan2 uses std::sort, which is not guaranteed to be stable.

I've found that, at least for the Aria test problem, using a stable sort is necessary for ILU(0) to be effective. The choice of root node has much less effect.

@trilinos/zoltan2

Related issue: #6735.

mhoemmen commented 4 years ago

Users might actually prefer exactly the same results as with AztecOO, but using a stable sort would be a good start. @alanw0 tells me that the replacement of Aztec with Epetra and AztecOO led to grief, due to failure to solve some problems that could be solved before. This suggests that we should try to use the same algorithms as before, unless we have better replacements or we can't avoid some differences (e.g., due to thread parallelization).

srajama1 commented 4 years ago

This is an interesting problem ! I have never seen the root make that much of a difference (successfully solving and not solving) in practice. I would like to understand this problem better and keep it as an example. This goes against what I have said many times and have heard others says many times that two BFS does a decent job in finding a pseudo-peripheral node. I would like to know what is the difference in this case. Can you please send it to me ? We could write the stable sort and call it.

egboman commented 4 years ago

I am surprised a stable sort is required as I never heard this issue mentioned anywhere. Nice job tracking it down. What do you mean by Belos/Ifpack2 (with Zoltan2) fails? The number of GMRES iterations is too high?

jhux2 commented 4 years ago

I am surprised a stable sort is required as I never heard this issue mentioned anywhere.

I think the fact that maintaining the original ordering of the children with equal degrees results in an effective factorization is a total crap shoot.

What do you mean by Belos/Ifpack2 (with Zoltan2) fails? The number of GMRES iterations is too high?

Yes, GMRES stagnates.

jhux2 commented 4 years ago

This is an interesting problem ! I have never seen the root make that much of a difference (successfully solving and not solving) in practice. I would like to understand this problem better and keep it as an example. This goes against what I have said many times and have heard others says many times that two BFS does a decent job in finding a pseudo-peripheral node. I would like to know what is the difference in this case. Can you please send it to me ? We could write the stable sort and call it.

@srajama1 The root node choice has some effect. But using a stable sort makes GMRES converge, vs stagnating.

@vbrunini Is it ok to distribute the matrix?

jhux2 commented 4 years ago

@vbrunini reports: "The good news is this fixes 3 of our other regression tests, bad news is there are still others that exhibit the same symptom of passing with Aztec but not belos/ifpack2, and if I disable reordering for Aztec they fail."

I'll have a look at the new matrices he's provided, it may be that I need to look at the root node algorithm.

srajama1 commented 4 years ago

In some weird way, I feel happy that ordering is this important for preconditioning :) !

mhoemmen commented 4 years ago

Received wisdom from @alanw0 is that old apps had trouble switching from Aztec to AztecOO + Epetra, because it changed the ordering for solvers. He didn't say whether AztecOO or Epetra developers fixed anything as a result.

jhux2 commented 4 years ago

In some weird way, I feel happy that ordering is this important for preconditioning :) !

Ok, that is weird :).

srajama1 commented 4 years ago

@jhux2 : Can you get these new matrices please ?

jhux2 commented 4 years ago

Closing as fixed.