trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 563 forks source link

MueLu: Boundary Conditions Performance Loss #10459

Closed Adrian-Diaz closed 2 years ago

Adrian-Diaz commented 2 years ago

Question

@trilinos/\<MueLu>

Greeetings,

Lately I was trying to consolidate memory on an FEA system solve by implementing the boundary conditions as diagonal rows on my global matrix, rather than having 2 matrices for the original and reduced system. However, in doing so for some reason the MueLu preconditioner seems to give Belos a much harder time to converge, about 100 times more iterations to get the same answer as the previous method. Any idea what could be going on there? I attached the matrix in matrixmarket format and my MueLu parameters. Thanks for any help.

Adrian Diaz elasticity3D_share.txt A_matrix2.zip

cgcgcg commented 2 years ago

@Adrian-Diaz Could you also post the output of the preconditioner setup before/after the change? The eigenvalue estimates could be affected by having BCs in the system.

Adrian-Diaz commented 2 years ago

out_BC_removed.txt out_BC_included.txt

Seems some part of the algorithm also identifies them as Dirichlet conditions.

cgcgcg commented 2 years ago

Can you bump up the number of eigenvalue iterations for both problems? I'd expect the resulting eigenvalue estimates (on the finest level) to be identical, but they seem to differ.

cgcgcg commented 2 years ago

(Just double checking: did you check that the solution of the system with BCs eliminated solves the one with BCs included?)

Adrian-Diaz commented 2 years ago

Yea the solutions are almost identical, even the strain energy is virtually the same as printed at the end. This is the result for both with 500 max iterations for the eigenvalue estimate out_BC_included_500_eig.txt out_BC_removed_500_eig.txt .

cgcgcg commented 2 years ago

What I'm seeing is:

out_BC_included_500_eig.txt:"Ifpack2::Chebyshev": {Initialized: true, Computed: true, "Ifpack2::Details::Chebyshev":{degree: 2, lambdaMax: 2.86361, alpha: 7, lambdaMin: 0.409088, boost factor: 1.1}, Global matrix dimensions: [36663, 36663], Global nnz: 2603349}
out_BC_removed_500_eig.txt: "Ifpack2::Chebyshev": {Initialized: true, Computed: true, "Ifpack2::Details::Chebyshev":{degree: 2, lambdaMax: 3.08573, alpha: 7, lambdaMin: 0.440819, boost factor: 1.1}, Global matrix dimensions: [36300, 36300], Global nnz: 2577402}

So these matrices don't seem to have the same spectrum.

cgcgcg commented 2 years ago

Huh. I computed the spectral radius of D^-1 * A for the matrix you attached using scipy, and that gives me 3.08593796.

cgcgcg commented 2 years ago

What Trilinos SHA are you on?

Adrian-Diaz commented 2 years ago

Whats the SHA? The release version stated in the release notes this ran with is 12.2.

EDIT: However github suggests that file hasn't updated in 4 years.

cgcgcg commented 2 years ago

What does git rev-parse HEAD in your Trilinos source say?

Adrian-Diaz commented 2 years ago

I get 5a67b70fb3816e1cea26554c650b4dc7ca26747d

cgcgcg commented 2 years ago

You could try to pass "chebyshev: max eigenvalue" = 3.08593796 in the "smoother: params". If that fixes things, we at least know that it's a problem with the spectral computations.

Adrian-Diaz commented 2 years ago

Doesn't seem to change the convergence speed.

cgcgcg commented 2 years ago

I ran your matrix on my end, with 500 iterations, and I get

chebyshev: max eigenvalue (calculated by Ifpack2) = 3.08579
"Ifpack2::Chebyshev": {Initialized: true, Computed: true, "Ifpack2::Details::Chebyshev":{degree: 2, lambdaMax: 3.08579, alpha: 7, lambdaMin: 0.440827, boost factor: 1.1}, Global matrix dimensions: [36663, 36663], Global nnz: 2603349}

This was with a Trilinos develop from ~ 3 weeks ago. So I guess this has been fixed in the meantime.

Adrian-Diaz commented 2 years ago

I'll try updating to see.

Adrian-Diaz commented 2 years ago

Is that from the Trilinos "develop" branch? I'm still getting the old value.

cgcgcg commented 2 years ago

Hm. Let's try something else. Can your run ./MueLu_Driver.exe --matrix=A_matrix2.txt in packages/muelu/test/scaling/? You'll have to modify the input deck scaling.xml and set repartition: enable = false. (The iterations counts are pretty bad, presumably because we're not passing the correct nullspace. But the spectral computations should be correct.)

Adrian-Diaz commented 2 years ago

Its telling me there's a crash in the preconditioner setup if I run with those args. Its not printing the eigenvalue even with extreme verbosity.

EDIT: nvm the param file wasn't saving with repartition disabled. I'm getting 2.72 for the eigenvalue.

EDIT2: Forgot to set 500 iterations. Its giving 3.08578.

cgcgcg commented 2 years ago

Ok. So at least we're seeing the same results in that case. Question is, what is different between running it from your driver and in our driver. You can also set "debug = True" in the "smoother: params". That will tell you in a lot of detail what the power iteration is doing. That way, you might be able to pin down how the two runs differ.

Adrian-Diaz commented 2 years ago

I couldn't see any difference in the DriverCore routines for PreconditionerSetup; other than the nullspace setting but that was being triggered anyway. Whatever is happening seems to be parametric if the eigenvalue estimate takes place there. Is it being done by this line? A->SetMaxEigenvalueEstimate(-Teuchos::ScalarTraits::one());

cgcgcg commented 2 years ago

A matrix can cache an eigenvalue estimate. By setting that cached value to -1, we assure that a computation will be triggered once we get to the Chebyshev smoother construction.

Adrian-Diaz commented 2 years ago

Is the nullspace relevant to the eigenvalue estimate?

cgcgcg commented 2 years ago

No, it is not.

Adrian-Diaz commented 2 years ago

Copying over the scaling.xml params into the original elasticity3D.xml params didn't change it, also in that case Belos has growing error rather than shrinking 0_0.

cgcgcg commented 2 years ago

Could you try setting "debug = true" in the smoother list? That way we should be able to track down where the two eigenvalue estimates differ.

Adrian-Diaz commented 2 years ago

Eig_Estimate_BC_included.txt Eig_Estimate_BC_removed.txt

Not sure how its finding a negative value on the diagonal for the included case.

cgcgcg commented 2 years ago

When I run with the MueLu_Driver it does not print that. Could there be a difference between the matrix in the file and what you're running?

Adrian-Diaz commented 2 years ago

The Matrix output is right before preconditioner setup. Its the Tpetra storage though, I'm not sure if the Xpetra wrapper is somehow corrupted?

cgcgcg commented 2 years ago

If you're using the Xpetra::IO routines, you can try writeLocal. That one doesn't collect the matrix on a single rank and then prints, but outright writes the rank local matrices. That's one processing step less. Also, could you share how you enforce BCs? EDIT: The reason I'm asking about the local matrices is that I have seen matrices with repeated entries (and that's bad). When the matrix gets collected on a single rank, these entries are summed up and the issue disappears.

jhux2 commented 2 years ago

@trilinos/muelu

Adrian-Diaz commented 2 years ago

How do I invoke writeLocal, I couldn't find it in xpetra, will A->describe() be enough? The BCs are being added as trivial diagonal entries set equal to the condition on the DOF; I rescaled them by an arbitrary diagonal entry of the stiffness matrix to see if improving the condition number helped, which is why there aren't simply 1s there.

cgcgcg commented 2 years ago

https://github.com/trilinos/Trilinos/blob/7bc4eb697c4ccd9fab1742661da3260a1069d41a/packages/xpetra/sup/Utils/Xpetra_IO.hpp#L323

Adrian-Diaz commented 2 years ago

A_matrixlocal.zip

I got the following 4 files.

cgcgcg commented 2 years ago

Hm, they look reasonable to me. I think you might have to hack something in here https://github.com/trilinos/Trilinos/blob/97353f2ce580dcde02f4b04a78cccb5e1c942f43/packages/ifpack2/src/Ifpack2_Details_Chebyshev_def.hpp#L1131-L1133 to see where it thinks there are non-positive entries.

cgcgcg commented 2 years ago

One additional idea: these local matrices look like the rows aren't sorted. Are you telling Tpetra that they aren't sorted when you create the matrix? Maybe Tpetra thinks that the matrix is sorted, and takes some shortcuts in the diagonal extraction that aren't valid.

Adrian-Diaz commented 2 years ago

Not sure if its setting that, I built the matrix with a kokkos view constructor and a column map function from Tpetra Details:

Tpetra::Details::makeColMap<LO,GO,node_type>(colmap,dommap,DOF_Graph_Matrix.get_kokkos_view(), nullptr);

  size_t nnz = DOF_Graph_Matrix.size();

  //local indices in the graph using the constructed column map
  CArrayKokkos<LO, array_layout, device_type, memory_traits> stiffness_local_indices(nnz, "stiffness_local_indices");

  //row offsets with compatible template arguments
    row_pointers row_offsets = DOF_Graph_Matrix.start_index_;
    row_pointers row_offsets_pass("row_offsets", nlocal_nodes*num_dim+1);
    for(int ipass = 0; ipass < nlocal_nodes*num_dim + 1; ipass++){
      row_offsets_pass(ipass) = row_offsets(ipass);
    }

  size_t entrycount = 0;
  for(int irow = 0; irow < nlocal_nodes*num_dim; irow++){
    for(int istride = 0; istride < Stiffness_Matrix_Strides(irow); istride++){
      stiffness_local_indices(entrycount) = colmap->getLocalElement(DOF_Graph_Matrix(irow,istride));
      entrycount++;
    }
  }

  Global_Stiffness_Matrix = Teuchos::rcp(new MAT(local_dof_map, colmap, row_offsets_pass, stiffness_local_indices.get_kokkos_view(), Stiffness_Matrix.get_kokkos_view()));
  Global_Stiffness_Matrix->fillComplete();
cgcgcg commented 2 years ago

I guess that's this constructor: https://github.com/trilinos/Trilinos/blob/7bc4eb697c4ccd9fab1742661da3260a1069d41a/packages/tpetra/core/src/Tpetra_CrsMatrix_decl.hpp#L776-L807 So maybe try setting "sorted = false".

Adrian-Diaz commented 2 years ago

Hmm, that didn't seem to change the result. Not sure if the paramlist was passed correctly though:

Teuchos::RCP crs_matrix_params = Teuchos::rcp(new Teuchos::ParameterList("crsmatrix")); crs_matrix_params->set("sorted", false); Global_Stiffness_Matrix = Teuchos::rcp(new MAT(local_dof_map, colmap, row_offsets_pass, stiffness_local_indices.get_kokkos_view(), Stiffness_Matrix.get_kokkos_view(), crs_matrix_params));

cgcgcg commented 2 years ago

:-( Any luck with hacking something into the power method directly to check why it complains?

Adrian-Diaz commented 2 years ago

Well I ran on 1 rank and the result is the same. In that case its under the impression the diagonal entry (400,400) is 0 according to the error block. However I see 1.88034188034188046e+04 in the file output.

cgcgcg commented 2 years ago

Ok, next step: dump not only the matrix, but the diagonal vector. Maybe something is wrong with how that diagonal is computed? (A wrong diagonal could be caused by a bad column map.)

Adrian-Diaz commented 2 years ago

What function should I use to print it? Also, the column map is the same as the row map for the 1 rank run, the default ordered set.

Adrian-Diaz commented 2 years ago

This is the diagonal made with the matrix getLocalDiagCopy routine Diagonal.txt .

cgcgcg commented 2 years ago

This getting more confusing. There are non-positive numbers in it, yet it claims there are? Which diag did you print, D_rowmap or D_rangemap? (It most likely doesn't matter, but you never know.) https://github.com/trilinos/Trilinos/blob/97353f2ce580dcde02f4b04a78cccb5e1c942f43/packages/ifpack2/src/Ifpack2_Details_Chebyshev_def.hpp#L1113

Adrian-Diaz commented 2 years ago

It says it prints with the row map. I don't see an option for the range map.

cgcgcg commented 2 years ago

I'd hack the print statement directly into the spot I pointed out above. Then you can print both D_rowmap and D_rangemap.

Adrian-Diaz commented 2 years ago

The describe function for the matrix says the range map is the same as the row map; the domain map is the same as the column map. I tried printing the vectors ifpack2 is using with DrowMap->describe(*out,Teuchos::VERB_EXTREME); DrangeMap->describe(*out,Teuchos::VERB_EXTREME); but that causes a segfault.

cgcgcg commented 2 years ago

So is the issue that D_rowMap is bad? How did that happen?

Adrian-Diaz commented 2 years ago

Yea trying to print it is segfaulting. Valgrind didn't give me any hints without the prints. Maybe something to do with the diagonal offsets call?

cgcgcg commented 2 years ago

Maybe try printing the offsets to see if they are bad? (You might have to write the print loop yourself.)