idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.67k stars 1.03k forks source link

PETSc schur field split breakage #22359

Open GiudGiud opened 1 year ago

GiudGiud commented 1 year ago

Bug Description

New petsc update broke support for the PETSc Schur field split we need to patch petsc to fix it

Steps to Reproduce

In tests for field splits, switch to Schur

Impact

Potential speedup of NS solves Potential speedup of many 2-physics-group multiphysics problems

lindsayad commented 1 year ago

I'm wondering if we can change this from a bug report to a feature enhancement focused on getting good splits? Until someone runs across the purported breakage? I'm wondering if perhaps you guys were working with PETSc main instead of our PETSc submodule hash?

GiudGiud commented 1 year ago

It was most definitely not working on the two machines we were working on, I m sure we can reproduce it. If you want to try, run the SFR porous assembly case in Sebastian's neams repo with the field split

lindsayad commented 1 year ago

remind me where that repo is?

GiudGiud commented 1 year ago

https://gitlab.software.inl.gov/schuseba/neams-th-2022

lindsayad commented 1 year ago

Sigh. I have tried two SFR inputs in that repo, model_1.i and ABTR/core/core.i and have run into failed CellCenteredMapFunctor access. I've created #22557 to hopefully give better errors at some point. @snschune do you remember hitting these errors a lot? I know you have reported them in the past

lindsayad commented 1 year ago

In the Block matrices section, it says:

Note that for interlaced storage the number of rows/columns of each block must be the same size.

We do use interlaced storage when all variables are of the same finite element type. And we violate the statement in general if there is mesh subdomain restriction of objects

lindsayad commented 1 year ago

Error message, which is the same as reported by @js-jixu on #22468:

[5]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[5]PETSC ERROR: Nonconforming object sizes
[5]PETSC ERROR: Local columns of A10 15905 do not equal local rows of A00 13607
lindsayad commented 1 year ago

However, @GiudGiud, if you and @fdkong created a patch that removes the restriction stated in the manual, then that is great. In the meantime, we can deal with this at the libMesh/MOOSE level, I believe...

GiudGiud commented 1 year ago

no that was not it.

lindsayad commented 1 year ago

Hmmm I tried setting identify_variable_groups = false and sorted the variable ordering and that was to no avail

lindsayad commented 1 year ago

no that was not it.

What do you mean? The error basically exactly matches the doc

lindsayad commented 1 year ago

are you saying this a different error than what you witnessed before?

GiudGiud commented 1 year ago

yes. But I agree this is a significant error

lindsayad commented 1 year ago

I feel like your error is more elusive than a wolverine haha

js-jixu commented 1 year ago

I can provide input files and different mesh files if you guys need them. I used schur in the input file. It works fine in some mesh files, but not in others.

lindsayad commented 1 year ago

@js-jixu yes if you could share an input and mesh file (and if it's a .msh file, also the .geo file) that has the non-matching sizes error, that would be great. I want the .geo file to see whether I can make the problem as coarse as possible (and still reproduce the error)

js-jixu commented 1 year ago

Okey, I will sort it out and send it to you asap.

There is an input file, a .geo file, two .msh files and a graph. When using 3layers_3d_4parts_coarse, it can run successfully, but when changing to 3layers_3d_4parts_fine, an error occurs. You can modify lines 84-87 & 93 in .geo to change the density and number of grids. But please change the settings in Tools/Options/Mesh in gmsh to the one shown in the figure.

I have other geometry and mesh files with this problem as well. I can provide them if you want.

alex.zip

js-jixu commented 1 year ago

Is there any progress on this issue?👀

lindsayad commented 1 year ago

Currently having a conversation about it on the petsc users mailing list

lindsayad commented 1 year ago

@GiudGiud is this the error you remember getting?

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Petsc has generated inconsistent data
[0]PETSC ERROR: Invalid stack: push from MatMult_MFFD /home/lindad/projects/moose/petsc/src/mat/impls/mffd/mffd.c:357. Pop from libmesh_petsc_snes_mffd_interface ../src/solvers/petsc_nonlinear_solver.C:412.

I'm wondering if perhaps you guys were working with PETSc main instead of our PETSc submodule hash?

I've checked and this indeed is a regression that occurred somewhere between our submodule hash and PETSc 3.18.1. Our field split no longer works with PJFNK (but still works with NEWTON)

lindsayad commented 1 year ago

@js-jixu I've figured out that we are not getting all the dofs in our index sets for our splits, e.g. the local dof indices we determine for split 0 and split 1 do not sum to the local size of our matrix ... so we are missing some dofs in our iteration through the local mesh elements in our field decomposition routine. Now that I know what the problem is, I'm hopeful that I can get this resolved pretty soon.

lindsayad commented 1 year ago

@GiudGiud can you please figure out what your previous error message was? I have two pull requests up for different issues. I'd like to get this closed if we can