Trilinos Multilevel Preconditioner solver for FEM

Resacd commented 2 years ago

I am trying to solve the stokes equations using finite element method. I distributed the mesh across MPI processors using METIS. I assembled all the finite element matrices locally and stored them in CSR format in each processor. I also have some variables referring to local and global indexing. I've already got Trilinos working and I am able to run all the examples here https://docs.trilinos.org/dev/packages/ml/doc/html/index.html#ml_examples. For a few days I am working around to choose a package which is a block box solver and suitable for my problem. Unfortunately, I couldn't find any example employing finite element and compatible with my domain decomposition. As far as I understood, Trilinos mostly uses examples in Galery package in the link above and all those examples use finite difference method. I thinks for solving a system of linear equation using Trilinos the steps should be summarized in the following step, 1- Loading distributed coefficient matrices from each processor 2- Loading distributed right hand side vector from each processor 3- Calling Ax=b solver Now my problem is I don't know how to go through these steps since there is no example for this purpose available. I would appreciate any hint in advance.

cgcgcg commented 2 years ago

We have MueLu for algebraic multigrid and FROSch for domain decomposition methods. Both work for FEM. (Setting up a simple example in Galeri is easier for finite difference, that's why we use that in lots of tests). Were you able to check that the system that you are setting up matches what you expect? (E.g. by writing it out to a matrix market file?) Next, what's your approach for solving Stokes? @trilinos/muelu @trilinos/shylu

Resacd commented 2 years ago

@cgcgcg Thanks for your answer. I am using Galerkin method with a linear triangle element. Sorry I didn't understand this part, Were you able to check that the system that you are setting up matches what you expect? (E.g. by writing it out to a matrix market file?)

cgcgcg commented 2 years ago

What I meant is: were you able to verify that the Tpetra matrix that you are setting up is indeed correct. Distributing a FEM matrix by rows usually requires some work, unless the elements along the processor boundaries are duplicated.

Resacd commented 2 years ago

I am not going to distribute the matrix by row since I have the domain decomposed by METIS.

cgcgcg commented 2 years ago

How about you follow something like https://github.com/trilinos/Trilinos/blob/master/packages/ml/examples/BasicExamples/ml_preconditioner.cpp ? Please be aware though that we will stop support for the Epetra solver stack in the future.

Resacd commented 2 years ago

Based on what I have described about my problem which solver do you suggest as the best option?

cgcgcg commented 2 years ago

After re-reading the description of your approach, I am concerned that your results will be wrong. You are distributing the mesh. So each element will live on one rank. Then you assemble on the local meshes. You are missing a step to get that information into the required matrix distribution by row. There is no point in talking about solvers until you check that the linear system is correct.

cgcgcg commented 2 years ago

Here is the issue: degrees of freedom can be associated with more than one element, and the sub-assembly over several elements contribute to entries in the matrix. Think of an entry (i, i) in the matrix where dof i sits on an interface between two processors. Elements on more than one rank need to contribute to that value. In a row-wise distribution of the matrix, the final value of that entry exists only on one rank.

Resacd commented 2 years ago

I think this issue can be solved by a division by the number of repetition?

cgcgcg commented 2 years ago

Yes, you can duplicate elements and solve the issue that way.

Resacd commented 2 years ago

The link you shared it's about finite difference I guess.

cgcgcg commented 2 years ago

We also have some old examples for FEM & Epetra: https://github.com/trilinos/Trilinos/blob/master/packages/trilinoscouplings/examples/scaling/IntrepidPoisson_Pamgen_Epetra_main.cpp

Resacd commented 2 years ago

Thanks for your help. Let me go through this example and I will let you know if I have got any success.

Resacd commented 2 years ago

@cgcgcg I have following piece of code already, I just don't know how to fill the coefficient matrix and rhs.

Epetra_LinearProblem problem(&A, &x, &b);
    AztecOO solver(problem);
    ParameterList MLList;
    ML_Epetra::SetDefaults("DD", MLList);
    MLList.set("max levels", 2);
    MLList.set("increasing or decreasing", "increasing");
    MLList.set("aggregation: type", "METIS");
    MLList.set("aggregation: nodes per aggregate", 16);
    MLList.set("smoother: pre or post", "both");
    MLList.set("coarse: type", "Amesos_KLU");
    MLList.set("smoother: type", "aztec");
    int options[AZ_OPTIONS_SIZE];
    double params[AZ_PARAMS_SIZE];
    AZ_defaults(options, params);
    options[AZ_precond] = AZ_dom_decomp;
    options[AZ_subdomain_solve] = AZ_icc;
    MLList.set("smoother: aztec options", options);
    MLList.set("smoother: aztec params", params);
    ML_Epetra::MultiLevelPreconditioner* MLPrec =
        new ML_Epetra::MultiLevelPreconditioner(A, MLList, true);
    MLPrec->PrintUnused();
    solver.SetPrecOperator(MLPrec);
    solver.SetAztecOption(AZ_solver, AZ_cg_condnum);
    int Niters = 500;
    solver.SetAztecOption(AZ_kspace, 160);
    solver.Iterate(Niters, 1e-12);
    cout << MLPrec->GetOutputList();

cgcgcg commented 2 years ago

@Resacd You need to decide which processor owns which row. Currently, both proc 0 and 1 have entries for rows 1 and 2. What can be helpful for FEM assembly are helper classes called "FECrsMatrix". Both Epetra and Tpetra should have them. In that way, you can do the local assembly on uniquely distributed elements, and then let Trilinos redistribute the data.

cgcgcg commented 2 years ago

You have two options: 1) Uniquely distribute elements (e.g. using METIS), assemble, then communicate to obtain a row-wise matrix distribution. (See FECrsMatrix for that.) 2) Duplicate elements near the processor boundaries, then assemble only the matrix entries that are owned by the processor.

Resacd commented 2 years ago

I will use metis. So as I understand from what you said I should give chuck of element to each processor. Then ask FECrsMatrix to do the assembly? and what will be the next? I am just so sorry for taking your time and being naive.

cgcgcg commented 2 years ago

After using FECrsMatrix, you should end up with a correctly distributed CrsMatrix for your solve. For the RHS, there is a similar procedure, using FEMultiVector.

Resacd commented 2 years ago

Thanks for your help. do you know where I can get an example of FECrsMatrix to assemble the local stiffness matrix in each processor? what will happen If I just use my own routine to assemble and then ask trilinos to communicate to obtain a row-wise matrix distribution? The problem of shared nodes can be solved just by dividing their value by the number of repetition unless something is happening inside trilinos which I have no idea on that which does not allow having shared nodes.
I want to stick with the first option that pointed out as my possible option. I am trying to avoid as much as possible duplicating data as I am concern with memory efficiency.

cgcgcg commented 2 years ago

For Epetra's FECrsMatrix, have a look at this test: https://github.com/trilinos/Trilinos/tree/master/packages/epetra/test/FECrsMatrix

You cannot simply divide or multiply shared nodes, that doesn't work for non-uniform elements.

mayrmt commented 2 years ago

@Resacd The keyword here is "overlapping domain decomposition", where the subdomain/processor boundary cuts through an element. Then, each node is uniquely assigned to one processor (and some elements need to be "duplicated" for the FEM evaluation and assembly).

Resacd commented 2 years ago

Thanks for your comment.

Resacd commented 2 years ago

@cgcgcg Sorry again. If I don't use FECsrMatrix for assembling local matrices but instead I manage my own assembly function in a way that nodes not to be shared between processors then will trilinos work fine? And one another thing, if trilinos just work for row wise distributed matrices what is the point of having METIS in the package? I was thinking when I divide the mesh using metis then the solvers in trilinos packages will be able to do all matrix-vector multiplications and other stuff based on how metis work.

cgcgcg commented 2 years ago

You don't have to use the FE* helper classes. You just need a row-wise distribution of the matrix. For FEM, you need to work with elements and degrees of freedom. and you need to decide on ways to distribute both of them. METIS (and other algos in Zoltan/Zoltan2) is a perfectly reasonable choice to distribute elements. This almost induces a distribution of the degrees of freedom, but you need to decide which rank owns interface unknowns.

Resacd commented 2 years ago

I almost got the idea. Thanks for the help. I just have one question regarding the case Epetra_CrsMatrix A(Copy, map, 3). what if each row has different number of entry. Should I put this inside a loop? In a tridiagonal matrix it makes sense to have the entry number of each row as 3 but when each row has different number of entry then I think I should put this inside a loop but this makes a copy of A for each iteration.

cgcgcg commented 2 years ago

I don't think I understand the use of a loop. We should have some matrix constructors that take a vector of number of entries per row, that's probably what you want.

Resacd commented 2 years ago

I wrote the following piece of code to solve the distributed system of linear equation using ML::EpetraPreconditioner. Unfortunately in some nodes of my domain I'm not getting the correct answer. Any help would be appreciated. I want to know if I'm missing something during solver call?

 void Solver::Trisolver(int nNodes_global, SparseMatrix& SpM, VecInt_t& Local2Global, VecDbl_t& rhs) {

#ifdef HAVE_MPI
    Epetra_MpiComm Comm(MPI_COMM_WORLD);
#else
    Epetra_SerialComm Comm;
#endif
    Epetra_Time Time(Comm);
    int myRank = Comm.MyPID();
    int numProcs = Comm.NumProc();
    if (myRank == 0) {
        // Print out the Epetra software version.
        cout << Epetra_Version() << endl << endl
            << "Total number of processes: " << numProcs << endl;
    }
#ifdef EPETRA_NO_32BIT_GLOBAL_INDICES
    typedef long long global_ordinal_type;
#else
    typedef int global_ordinal_type;
#endif
    int NumGlobalElements = nNodes_global;
    Epetra_IntSerialDenseVector MyGlobalElements;
    int NumMyElements = Local2Global.size();
    MyGlobalElements.Size(NumMyElements);

    for (int i = 0; i < NumMyElements; i++) {
        MyGlobalElements[i] = Local2Global[i];
    }

    Epetra_Map Map(-1, MyGlobalElements.Length(), MyGlobalElements.Values(), 0, Comm);
    Epetra_Vector x(Map);
    for (int i = 0; i < NumMyElements; ++i)  x[i] = rhs[i];
    std::cout << x;
    std::cout << std::endl;

    //========================================================================

    int* NumNz = new int[NumMyElements];
    VecInt_t* row_ptr = SpM.row_ptr_;
    VecInt_t* col_ind = SpM.col_ind_;

    for (int i = 0; i < NumMyElements; i++) {

        NumNz[i] = (*row_ptr)[MyGlobalElements[i] + 1] - (*row_ptr)[MyGlobalElements[i]];
    }

    Epetra_CrsMatrix A(Copy, Map, NumNz);
    int NumEntries = 0;
    for (int i = 0; i < NumMyElements; i++) {

        double* Values = new double[NumNz[i]];
        int* Indices = new int[NumNz[i]];
        NumEntries = NumNz[i];
        int col_begin = (*row_ptr)[MyGlobalElements[i]];
        int col_end = (*row_ptr)[MyGlobalElements[i] + 1];
        int k = 0;
        for (int j = col_begin; j < col_end; j++) {
            Values[k] = SpM.val_[j];
            Indices[k] = (*col_ind)[j];
            k++;
            if (k == NumNz[i])break;
        }
        A.InsertGlobalValues(MyGlobalElements[i], NumEntries, Values, Indices);
        delete[] Values;
        delete[] Indices;
    }

    A.FillComplete();
    std::cout << A;

    //==============ML Epetra::MultiLevelPreconditioner===========
    Epetra_Vector sol(Map);
    ML* ml_handle;
    int N_levels = 10;
    ML_Set_PrintLevel(3);
    ML_Create(&ml_handle, N_levels);
    EpetraMatrix2MLMatrix(ml_handle, 0, &A);
    ML_Aggregate* agg_object;
    ML_Aggregate_Create(&agg_object);
    N_levels = ML_Gen_MGHierarchy_UsingAggregation(ml_handle,0,ML_INCREASING,agg_object);
    ML_Gen_Smoother_SymGaussSeidel(ml_handle, ML_ALL_LEVELS,ML_BOTH, 1, ML_DEFAULT);
    ML_Gen_Solver(ml_handle, ML_MGV, 0, N_levels - 1);
    ML_Epetra::MultiLevelOperator MLop(ml_handle, Comm, Map, Map);

    Epetra_LinearProblem Problem(&A,&sol, &x);
    AztecOO solver(Problem);
    // =========================== begin of ML part ===========================
    ParameterList MLList;
    int options[AZ_OPTIONS_SIZE];
    double params[AZ_PARAMS_SIZE];
    AZ_defaults(options, params);
    options[AZ_precond] = AZ_dom_decomp;
    options[AZ_subdomain_solve] = AZ_ilu;
    options[AZ_graph_fill] = 0;
    options[AZ_overlap] = 0;
    ML_Epetra::SetDefaults("DD", MLList, options, params);
    MLList.set("aggregation: type", "METIS");
    MLList.set("smoother: type", "Aztec");
    MLList.set("aggregation: nodes per aggregate", 64);
    MLList.set("smoother: pre or post", "pre");
    MLList.set("coarse: type", "Amesos-KLU");
    ML_Epetra::MultiLevelPreconditioner* MLPrec =
        new ML_Epetra::MultiLevelPreconditioner(A, MLList, true);
    solver.SetPrecOperator(MLPrec);
    // =========================== end of ML part =============================
    solver.SetAztecOption(AZ_solver, AZ_gmres_condnum);
    solver.SetAztecOption(AZ_output, 32);
    solver.Iterate(500, 1e-12);
    delete MLPrec;
    double residual;
    sol.Norm2(&residual);
    if (Comm.MyPID() == 0)
    {
        cout << "||x_exact - x||_2 = " << residual << endl;
        cout << "Total Time = " << Time.ElapsedTime() << endl;
        //std::cout << "The solution is as follows" << std::endl;
        //cout << sol;

    }
}

cgcgcg commented 2 years ago

Can you compute the residual norm $||Ax-b||_2$ and check whether the specified system is solved correctly?

Resacd commented 2 years ago

First of all let me thank you for your help. I appreciate that. Let's me compute that and I'll post the answer here. Just one thing about metis, I am Already using metis to do a element division and as you said it causes one entry sits in more that one rank. What if I ask metis to do a node base division. In that case I think boundary element will be automatically copied among those processors which where sharing the same node node in element based division.

cgcgcg commented 2 years ago

You can use METIS to distribute the nodes. For this to work, you then need to duplicate elements and only assemble the entries in rows owned by the rank.

cgcgcg commented 2 years ago

Please reopen if this is not resolved.

trilinos / Trilinos

Trilinos Multilevel Preconditioner solver for FEM #10539