An error when using AMGX solver.

Hi Everybody! I am trying to apply AMGX solver in my problem. When I perform FEM using AMGX solver (combine with cuda), the vector displacement I receive is not right. Then I comeback the example 1 (https://github.com/mfem/mfem/blob/master/examples/amgx/ex1.cpp) and run the example 1 with cuda. I try printing the vector displacement like:

for(int i=0;i<X.Size();i++){
     cout<<X(i)<<"   ";
   }

but I receive the result not right, too. It returns all garbage values. I think it is an error or I do not understand it. After FEM process, I would like to use the vector displacement to implement my problem. May you help me? Thank you!

Hi @nghiakvnvsd, I can try to help. Can you give an example how you are running?

I tried adding the print:

for(int i=0;i<X.Size();i++){
  cout<<X(i)<<"   ";
}

right before step 12. on ex1.cpp (around line 230). If you run using the cpu (./ex1 -d cpu) the numbers look okay to me, but if when I run using cuda (./ex1 -d cuda) I do see garbage numbers. I was able to fix that by calling X.HostRead() right before the print statement

   X.HostRead();                                                                                                                                                            
   for(int i=0;i<X.Size();i++){                                                                                                                                             
     cout<<X(i)<<"   ";                                                                                                                                                     
   }

Could you verify that this solves the issue in example 1? It could be you are missing memory transfers in your application. If you build MFEM in debug mode it should throw errors when accessing invalid data, additionally AmgX is compatible with MFEM's cpu device mode.

Hi @artv3 , thank you for your help. I have fix it in example 1 but my problem is not good. My problem refer from the example 2. I try to modified the example 2 using AMGX. You can see the code. The problem is not converge when using AMGX with ./ex2 -d cuda or ./ex2 --amgx-file precon.json --amgx-preconditioner -d cuda

//                                MFEM Example 2
//                              AmgX Modification
//
// Compile with: make ex2
//
// AmgX sample runs:
//               ex2
//               ex2 -d cuda
//               ex2 --amgx-file multi_gs.json --amgx-solver
//               ex2 --amgx-file precon.json --amgx-preconditioner
//               ex2 --amgx-file multi_gs.json --amgx-solver -d cuda
//               ex2 --amgx-file precon.json --amgx-preconditioner -d cuda
//
// Description:  This example code solves a simple linear elasticity problem
//               describing a multi-material cantilever beam.
//
//               Specifically, we approximate the weak form of -div(sigma(u))=0
//               where sigma(u)=lambda*div(u)*I+mu*(grad*u+u*grad) is the stress
//               tensor corresponding to displacement field u, and lambda and mu
//               are the material Lame constants. The boundary conditions are
//               u=0 on the fixed part of the boundary with attribute 1, and
//               sigma(u).n=f on the remainder with f being a constant pull down
//               vector on boundary elements with attribute 2, and zero
//               otherwise. The geometry of the domain is assumed to be as
//               follows:
//
//                                 +----------+----------+
//                    boundary --->| material | material |<--- boundary
//                    attribute 1  |    1     |    2     |     attribute 2
//                    (fixed)      +----------+----------+     (pull down)
//
//               The example demonstrates the use of high-order and NURBS vector
//               finite element spaces with the linear elasticity bilinear form,
//               meshes with curved elements, and the definition of piece-wise
//               constant and vector coefficient objects. Static condensation is
//               also illustrated.
//
//               We recommend viewing Example 1 before viewing this example.

#include "mfem.hpp"
#include <fstream>
#include <iostream>

#ifndef MFEM_USE_AMGX
#error This example requires that MFEM is built with MFEM_USE_AMGX=YES
#endif

using namespace std;
using namespace mfem;

int main(int argc, char *argv[])
{
   // 1. Parse command-line options.
   const char *mesh_file = "../../data/beam-tri.mesh";
   int order = 1;
   bool static_cond = false;
   const char *device_config = "cpu";
   bool visualization = true;
   bool amgx_lib = true;
   bool amgx_solver = true;
   const char* amgx_json_file = ""; // JSON file for AmgX

   OptionsParser args(argc, argv);
   args.AddOption(&mesh_file, "-m", "--mesh",
                  "Mesh file to use.");
   args.AddOption(&order, "-o", "--order",
                  "Finite element order (polynomial degree) or -1 for"
                  " isoparametric space.");
   args.AddOption(&static_cond, "-sc", "--static-condensation", "-no-sc",
                  "--no-static-condensation", "Enable static condensation.");
   args.AddOption(&amgx_lib, "-amgx", "--amgx-lib", "-no-amgx",
                  "--no-amgx-lib", "Use AmgX in example.");
   args.AddOption(&amgx_json_file, "--amgx-file", "--amgx-file",
                  "AMGX solver config file (overrides --amgx-solver, --amgx-verbose)");
   args.AddOption(&amgx_solver, "--amgx-solver", "--amgx-solver",
                  "--amgx-preconditioner", "--amgx-preconditioner",
                  "Configure AMGX as solver or preconditioner.");
   args.AddOption(&device_config, "-d", "--device",
                  "Device configuration string, see Device::Configure().");
   args.AddOption(&visualization, "-vis", "--visualization", "-no-vis",
                  "--no-visualization",
                  "Enable or disable GLVis visualization.");
   args.Parse();
   if (!args.Good())
   {
      args.PrintUsage(cout);
      return 1;
   }
   args.PrintOptions(cout);

   // 2. Read the mesh from the given mesh file. We can handle triangular,
   //    quadrilateral, tetrahedral or hexahedral elements with the same code.
   Mesh mesh(mesh_file, 1, 1);
   int dim = mesh.Dimension();

   if (mesh.attributes.Max() < 2 || mesh.bdr_attributes.Max() < 2)
   {
      cerr << "\nInput mesh should have at least two materials and "
           << "two boundary attributes! (See schematic in ex2.cpp)\n"
           << endl;
      return 3;
   }

   // 3. Select the order of the finite element discretization space. For NURBS
   //    meshes, we increase the order by degree elevation.
   if (mesh.NURBSext)
   {
      mesh.DegreeElevate(order, order);
   }

   // 4. Refine the mesh to increase the resolution. In this example we do
   //    'ref_levels' of uniform refinement. We choose 'ref_levels' to be the
   //    largest number that gives a final mesh with no more than 5,000
   //    elements.
   {
      int ref_levels =
         (int)floor(log(5000./mesh.GetNE())/log(2.)/dim);
      for (int l = 0; l < ref_levels; l++)
      {
         mesh.UniformRefinement();
      }
   }

   // 5. Define a finite element space on the mesh. Here we use vector finite
   //    elements, i.e. dim copies of a scalar finite element space. The vector
   //    dimension is specified by the last argument of the FiniteElementSpace
   //    constructor. For NURBS meshes, we use the (degree elevated) NURBS space
   //    associated with the mesh nodes.
   FiniteElementCollection *fec;
   FiniteElementSpace *fespace;
   if (mesh.NURBSext)
   {
      fec = NULL;
      fespace = mesh.GetNodes()->FESpace();
   }
   else
   {
      fec = new H1_FECollection(order, dim);
      fespace = new FiniteElementSpace(&mesh, fec, dim);
   }
   cout << "Number of finite element unknowns: " << fespace->GetTrueVSize()
        << endl << "Assembling: " << flush;

   // 6. Determine the list of true (i.e. conforming) essential boundary dofs.
   //    In this example, the boundary conditions are defined by marking only
   //    boundary attribute 1 from the mesh as essential and converting it to a
   //    list of true dofs.
   Array<int> ess_tdof_list, ess_bdr(mesh.bdr_attributes.Max());
   ess_bdr = 0;
   ess_bdr[0] = 1;
   fespace->GetEssentialTrueDofs(ess_bdr, ess_tdof_list);

   // 7. Set up the linear form b(.) which corresponds to the right-hand side of
   //    the FEM linear system. In this case, b_i equals the boundary integral
   //    of f*phi_i where f represents a "pull down" force on the Neumann part
   //    of the boundary and phi_i are the basis functions in the finite element
   //    fespace. The force is defined by the VectorArrayCoefficient object f,
   //    which is a vector of Coefficient objects. The fact that f is non-zero
   //    on boundary attribute 2 is indicated by the use of piece-wise constants
   //    coefficient for its last component.
   VectorArrayCoefficient f(dim);
   for (int i = 0; i < dim-1; i++)
   {
      f.Set(i, new ConstantCoefficient(0.0));
   }
   {
      Vector pull_force(mesh.bdr_attributes.Max());
      pull_force = 0.0;
      pull_force(1) = -1.0e-2;
      f.Set(dim-1, new PWConstCoefficient(pull_force));
   }

   LinearForm b(fespace);
   b.AddBoundaryIntegrator(new VectorBoundaryLFIntegrator(f));
   cout << "r.h.s. ... " << flush;
   b.Assemble();

   // 8. Define the solution vector x as a finite element grid function
   //    corresponding to fespace. Initialize x with initial guess of zero,
   //    which satisfies the boundary conditions.
   GridFunction x(fespace);
   x = 0.0;

   // 9. Set up the bilinear form a(.,.) on the finite element space
   //    corresponding to the linear elasticity integrator with piece-wise
   //    constants coefficient lambda and mu.
   Vector lambda(mesh.attributes.Max());
   lambda = 1.0;
   lambda(0) = lambda(1)*50;
   PWConstCoefficient lambda_func(lambda);
   Vector mu(mesh.attributes.Max());
   mu = 1.0;
   mu(0) = mu(1)*50;
   PWConstCoefficient mu_func(mu);

   BilinearForm a(fespace);
   a.AddDomainIntegrator(new ElasticityIntegrator(lambda_func,mu_func));

   // 10. Assemble the bilinear form and the corresponding linear system,
   //     applying any necessary transformations such as: eliminating boundary
   //     conditions, applying conforming constraints for non-conforming AMR,
   //     static condensation, etc.
   cout << "matrix ... " << flush;
   if (static_cond) { a.EnableStaticCondensation(); }
   a.Assemble();

   OperatorPtr A;
   Vector B, X;
   a.FormLinearSystem(ess_tdof_list, x, b, A, X, B);
   cout << "done." << endl;

   cout << "Size of linear system: " << A->Height() << endl;

   // 11. Solve the linear system A X = B.
   if (amgx_lib && strcmp(amgx_json_file,"") == 0)
   {
      bool amgx_verbose = false;
      AmgXSolver amgx(AmgXSolver::PRECONDITIONER, amgx_verbose);
      amgx.SetOperator(*A.As<SparseMatrix>());
      PCG(*A, amgx, B, X, 1, 10000, 1e-12, 0.0);
   }
   else if (amgx_lib && strcmp(amgx_json_file,"") != 0)
   {
      AmgXSolver amgx;
      amgx.ReadParameters(amgx_json_file, AmgXSolver::EXTERNAL);
      amgx.InitSerial();
      amgx.SetOperator(*A.As<SparseMatrix>());
      if (amgx_solver)
      {
         amgx.Mult(B,X);
      }
      else
      {
         PCG(*A.As<SparseMatrix>(), amgx, B, X, 1, 10000, 1e-12, 0.0);
      }
   }
   else
   {
#ifndef MFEM_USE_SUITESPARSE
      // Use a simple symmetric Gauss-Seidel preconditioner with PCG.
      GSSmoother M((SparseMatrix&)(*A));
      PCG(*A, M, B, X, 1, 10000, 1e-12, 0.0);
#else
      // If MFEM was compiled with SuiteSparse, use UMFPACK to solve the system.
      UMFPackSolver umf_solver;
      umf_solver.Control[UMFPACK_ORDERING] = UMFPACK_ORDERING_METIS;
      umf_solver.SetOperator(*A);
      umf_solver.Mult(B, X);
#endif
   }

   // 12. Recover the solution as a finite element grid function.
   a.RecoverFEMSolution(X, b, x);

   // 13. For non-NURBS meshes, make the mesh curved based on the finite element
   //     space. This means that we define the mesh elements through a fespace
   //     based transformation of the reference element. This allows us to save
   //     the displaced mesh as a curved mesh when using high-order finite
   //     element displacement field. We assume that the initial mesh (read from
   //     the file) is not higher order curved mesh compared to the chosen FE
   //     space.
   if (!mesh.NURBSext)
   {
      mesh.SetNodalFESpace(fespace);
   }

   // 14. Save the displaced mesh and the inverted solution (which gives the
   //     backward displacements to the original grid). This output can be
   //     viewed later using GLVis: "glvis -m displaced.mesh -g sol.gf".
   {
      GridFunction *nodes = mesh.GetNodes();
      *nodes += x;
      x *= -1;
      ofstream mesh_ofs("displaced.mesh");
      mesh_ofs.precision(8);
      mesh.Print(mesh_ofs);
      ofstream sol_ofs("sol.gf");
      sol_ofs.precision(8);
      x.Save(sol_ofs);
   }

   // 15. Send the above data by socket to a GLVis server. Use the "n" and "b"
   //     keys in GLVis to visualize the displacements.
   if (visualization)
   {
      char vishost[] = "localhost";
      int  visport   = 19916;
      socketstream sol_sock(vishost, visport);
      sol_sock.precision(8);
      sol_sock << "solution\n" << mesh << x << flush;
   }

   // 16. Free the used memory.
   if (fec)
   {
      delete fespace;
      delete fec;
   }

   return 0;
}

Hi @nghiakvnvsd , thank you for sharing your code. I verified that the default preconditioner needs some adjustments to solve example 2.

However, I suggest increasing the number of max_iters to 4 in precon.json to achieve convergence: usage: ./ex2 --amgx-file precon.json --amgx-preconditioner

{                                                                                                                                                                 
    "config_version": 2,                                                                                                                                          
    "solver": {                                                                                                                                                   
        "max_uncolored_percentage": 0.15,                                                                                                                         
        "algorithm": "AGGREGATION",                                                                                                                               
        "solver": "AMG",                                                                                                                                          
        "smoother": "MULTICOLOR_GS",                                                                                                                              
        "presweeps": 1,                                                                                                                                           
        "symmetric_GS" : 1,                                                                                                                                       
        "selector": "SIZE_2",                                                                                                                                     
        "coarsest_sweeps": 10,                                                                                                                                    
        "max_iters": 4,                                                                                                                                           
        "postsweeps": 1,                                                                                                                                          
        "scope": "main",                                                                                                                                          
        "max_levels": 1000,                                                                                                                                       
        "matrix_coloring_scheme" : "MIN_MAX",                                                                                                                     
        "tolerance": 0.0,                                                                                                                                         
        "norm": "L2",                                                                                                                                             
        "cycle": "V"                                                                                                                                              
    }                                                                                                                                                             
}

This should converge the example ~160 iterations.

Hi @nghiakvnvsd , thank you for sharing your code. I verified that the default preconditioner needs some adjustments to solve example 2.

However, I suggest increasing the number of max_iters to 4 in precon.json to achieve convergence: usage: ./ex2 --amgx-file precon.json --amgx-preconditioner

{                                                                                                                                                                 
    "config_version": 2,                                                                                                                                          
    "solver": {                                                                                                                                                   
        "max_uncolored_percentage": 0.15,                                                                                                                         
        "algorithm": "AGGREGATION",                                                                                                                               
        "solver": "AMG",                                                                                                                                          
        "smoother": "MULTICOLOR_GS",                                                                                                                              
        "presweeps": 1,                                                                                                                                           
        "symmetric_GS" : 1,                                                                                                                                       
        "selector": "SIZE_2",                                                                                                                                     
        "coarsest_sweeps": 10,                                                                                                                                    
        "max_iters": 4,                                                                                                                                           
        "postsweeps": 1,                                                                                                                                          
        "scope": "main",                                                                                                                                          
        "max_levels": 1000,                                                                                                                                       
        "matrix_coloring_scheme" : "MIN_MAX",                                                                                                                     
        "tolerance": 0.0,                                                                                                                                         
        "norm": "L2",                                                                                                                                             
        "cycle": "V"                                                                                                                                              
    }                                                                                                                                                             
}

This should converge the example ~160 iterations.

Thank you!!! it converges. I have a question, I want to apply into ex2p using AMGX. What do I need to fix in the files?

I try accessing the website of AMGX and see the folder (I think it has the file preconditioner: https://github.com/NVIDIA/AMGX/tree/main/core/configs). But when I copy some of them to the example 2, it does not work. May you tell me how to create file preconditioner like "precon.json"? Thank you very much!

Hi @nghiakvnvsd,

For the parallel case we have some options, we can run with what I call MPI teams where teams of MPI ranks consolidate their data into a GPU or where we run with 1 MPI rank per GPU. To use AMGX as a preconditioner as in the serial example please use the following branch https://github.com/mfem/mfem/pull/1904 (I noticed we were missing an option to configure AMGX as preconditioner when reading external json files) and see the code listing below.

By default the code will try to use map 1 MPI rank to a GPU: mpirun -n 4 ./ex2p --amgx-file precon.json To enable teams just add the following flag: mpirun -n 4 ./ex2p --amgx-file precon.json --amgx-mpi-teams When using the teams option users will also have to specify how many GPUs are available per node.

//                       MFEM Example 2 - Parallel Version
//
// Compile with: make ex2p
//
// Sample runs:  mpirun -np 4 ex2p -m ../data/beam-tri.mesh
//               mpirun -np 4 ex2p -m ../data/beam-quad.mesh
//               mpirun -np 4 ex2p -m ../data/beam-tet.mesh
//               mpirun -np 4 ex2p -m ../data/beam-hex.mesh
//               mpirun -np 4 ex2p -m ../data/beam-wedge.mesh
//               mpirun -np 4 ex2p -m ../data/beam-tri.mesh -o 2 -sys
//               mpirun -np 4 ex2p -m ../data/beam-quad.mesh -o 3 -elast
//               mpirun -np 4 ex2p -m ../data/beam-quad.mesh -o 3 -sc
//               mpirun -np 4 ex2p -m ../data/beam-quad-nurbs.mesh
//               mpirun -np 4 ex2p -m ../data/beam-hex-nurbs.mesh
//
// Description:  This example code solves a simple linear elasticity problem
//               describing a multi-material cantilever beam.
//
//               Specifically, we approximate the weak form of -div(sigma(u))=0
//               where sigma(u)=lambda*div(u)*I+mu*(grad*u+u*grad) is the stress
//               tensor corresponding to displacement field u, and lambda and mu
//               are the material Lame constants. The boundary conditions are
//               u=0 on the fixed part of the boundary with attribute 1, and
//               sigma(u).n=f on the remainder with f being a constant pull down
//               vector on boundary elements with attribute 2, and zero
//               otherwise. The geometry of the domain is assumed to be as
//               follows:
//
//                                 +----------+----------+
//                    boundary --->| material | material |<--- boundary
//                    attribute 1  |    1     |    2     |     attribute 2
//                    (fixed)      +----------+----------+     (pull down)
//
//               The example demonstrates the use of high-order and NURBS vector
//               finite element spaces with the linear elasticity bilinear form,
//               meshes with curved elements, and the definition of piece-wise
//               constant and vector coefficient objects. Static condensation is
//               also illustrated.
//
//               We recommend viewing Example 1 before viewing this example.

#include "mfem.hpp"
#include <fstream>
#include <iostream>

using namespace std;
using namespace mfem;

int main(int argc, char *argv[])
{
   // 1. Initialize MPI.
   int num_procs, myid;
   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
   MPI_Comm_rank(MPI_COMM_WORLD, &myid);

   // 2. Parse command-line options.
   const char *mesh_file = "../../data/beam-tri.mesh";
   int order = 1;
   bool static_cond = false;
   bool visualization = 1;
   bool amg_elast = 0;
   bool reorder_space = false;
   bool amgx_lib = true;
   bool amgx_mpi_teams = false;
   const char* amgx_json_file = ""; // JSON file for AmgX
   int ndevices = 1;

   OptionsParser args(argc, argv);
   args.AddOption(&mesh_file, "-m", "--mesh",
                  "Mesh file to use.");
   args.AddOption(&order, "-o", "--order",
                  "Finite element order (polynomial degree).");
   args.AddOption(&amg_elast, "-elast", "--amg-for-elasticity", "-sys",
                  "--amg-for-systems",
                  "Use the special AMG elasticity solver (GM/LN approaches), "
                  "or standard AMG for systems (unknown approach).");
   args.AddOption(&static_cond, "-sc", "--static-condensation", "-no-sc",
                  "--no-static-condensation", "Enable static condensation.");
   args.AddOption(&visualization, "-vis", "--visualization", "-no-vis",
                  "--no-visualization",
                  "Enable or disable GLVis visualization.");
   args.AddOption(&reorder_space, "-nodes", "--by-nodes", "-vdim", "--by-vdim",
                  "Use byNODES ordering of vector space instead of byVDIM");

   args.AddOption(&amgx_lib, "-amgx", "--amgx-lib", "-no-amgx",
                  "--no-amgx-lib", "Use AmgX in example.");
   args.AddOption(&amgx_json_file, "--amgx-file", "--amgx-file",
                  "AMGX solver config file (overrides --amgx-solver, --amgx-verbose)");
   args.AddOption(&amgx_mpi_teams, "--amgx-mpi-teams", "--amgx-mpi-teams",
                  "--amgx-mpi-gpu-exclusive", "--amgx-mpi-gpu-exclusive",
                  "Create MPI teams when using AmgX to load balance between ranks and GPUs.");

   args.Parse();
   if (!args.Good())
   {
      if (myid == 0)
      {
         args.PrintUsage(cout);
      }
      MPI_Finalize();
      return 1;
   }
   if (myid == 0)
   {
      args.PrintOptions(cout);
   }

   // 3. Read the (serial) mesh from the given mesh file on all processors.  We
   //    can handle triangular, quadrilateral, tetrahedral, hexahedral, surface
   //    and volume meshes with the same code.
   Mesh *mesh = new Mesh(mesh_file, 1, 1);
   int dim = mesh->Dimension();

   if (mesh->attributes.Max() < 2 || mesh->bdr_attributes.Max() < 2)
   {
      if (myid == 0)
         cerr << "\nInput mesh should have at least two materials and "
              << "two boundary attributes! (See schematic in ex2.cpp)\n"
              << endl;
      MPI_Finalize();
      return 3;
   }

   // 4. Select the order of the finite element discretization space. For NURBS
   //    meshes, we increase the order by degree elevation.
   if (mesh->NURBSext)
   {
      mesh->DegreeElevate(order, order);
   }

   // 5. Refine the serial mesh on all processors to increase the resolution. In
   //    this example we do 'ref_levels' of uniform refinement. We choose
   //    'ref_levels' to be the largest number that gives a final mesh with no
   //    more than 1,000 elements.
   {
      int ref_levels =
         (int)floor(log(1000./mesh->GetNE())/log(2.)/dim);
      for (int l = 0; l < ref_levels; l++)
      {
         mesh->UniformRefinement();
      }
   }

   // 6. Define a parallel mesh by a partitioning of the serial mesh. Refine
   //    this mesh further in parallel to increase the resolution. Once the
   //    parallel mesh is defined, the serial mesh can be deleted.
   ParMesh *pmesh = new ParMesh(MPI_COMM_WORLD, *mesh);
   delete mesh;
   {
      int par_ref_levels = 1;
      for (int l = 0; l < par_ref_levels; l++)
      {
         pmesh->UniformRefinement();
      }
   }

   // 7. Define a parallel finite element space on the parallel mesh. Here we
   //    use vector finite elements, i.e. dim copies of a scalar finite element
   //    space. We use the ordering by vector dimension (the last argument of
   //    the FiniteElementSpace constructor) which is expected in the systems
   //    version of BoomerAMG preconditioner. For NURBS meshes, we use the
   //    (degree elevated) NURBS space associated with the mesh nodes.
   FiniteElementCollection *fec;
   ParFiniteElementSpace *fespace;
   const bool use_nodal_fespace = pmesh->NURBSext && !amg_elast;
   if (use_nodal_fespace)
   {
      fec = NULL;
      fespace = (ParFiniteElementSpace *)pmesh->GetNodes()->FESpace();
   }
   else
   {
      fec = new H1_FECollection(order, dim);
      if (reorder_space)
      {
         fespace = new ParFiniteElementSpace(pmesh, fec, dim, Ordering::byNODES);
      }
      else
      {
         fespace = new ParFiniteElementSpace(pmesh, fec, dim, Ordering::byVDIM);
      }
   }
   HYPRE_Int size = fespace->GlobalTrueVSize();
   if (myid == 0)
   {
      cout << "Number of finite element unknowns: " << size << endl
           << "Assembling: " << flush;
   }

   // 8. Determine the list of true (i.e. parallel conforming) essential
   //    boundary dofs. In this example, the boundary conditions are defined by
   //    marking only boundary attribute 1 from the mesh as essential and
   //    converting it to a list of true dofs.
   Array<int> ess_tdof_list, ess_bdr(pmesh->bdr_attributes.Max());
   ess_bdr = 0;
   ess_bdr[0] = 1;
   fespace->GetEssentialTrueDofs(ess_bdr, ess_tdof_list);

   // 9. Set up the parallel linear form b(.) which corresponds to the
   //    right-hand side of the FEM linear system. In this case, b_i equals the
   //    boundary integral of f*phi_i where f represents a "pull down" force on
   //    the Neumann part of the boundary and phi_i are the basis functions in
   //    the finite element fespace. The force is defined by the object f, which
   //    is a vector of Coefficient objects. The fact that f is non-zero on
   //    boundary attribute 2 is indicated by the use of piece-wise constants
   //    coefficient for its last component.
   VectorArrayCoefficient f(dim);
   for (int i = 0; i < dim-1; i++)
   {
      f.Set(i, new ConstantCoefficient(0.0));
   }
   {
      Vector pull_force(pmesh->bdr_attributes.Max());
      pull_force = 0.0;
      pull_force(1) = -1.0e-2;
      f.Set(dim-1, new PWConstCoefficient(pull_force));
   }

   ParLinearForm *b = new ParLinearForm(fespace);
   b->AddBoundaryIntegrator(new VectorBoundaryLFIntegrator(f));
   if (myid == 0)
   {
      cout << "r.h.s. ... " << flush;
   }
   b->Assemble();

   // 10. Define the solution vector x as a parallel finite element grid
   //     function corresponding to fespace. Initialize x with initial guess of
   //     zero, which satisfies the boundary conditions.
   ParGridFunction x(fespace);
   x = 0.0;

   // 11. Set up the parallel bilinear form a(.,.) on the finite element space
   //     corresponding to the linear elasticity integrator with piece-wise
   //     constants coefficient lambda and mu.
   Vector lambda(pmesh->attributes.Max());
   lambda = 1.0;
   lambda(0) = lambda(1)*50;
   PWConstCoefficient lambda_func(lambda);
   Vector mu(pmesh->attributes.Max());
   mu = 1.0;
   mu(0) = mu(1)*50;
   PWConstCoefficient mu_func(mu);

   ParBilinearForm *a = new ParBilinearForm(fespace);
   a->AddDomainIntegrator(new ElasticityIntegrator(lambda_func, mu_func));

   // 12. Assemble the parallel bilinear form and the corresponding linear
   //     system, applying any necessary transformations such as: parallel
   //     assembly, eliminating boundary conditions, applying conforming
   //     constraints for non-conforming AMR, static condensation, etc.
   if (myid == 0) { cout << "matrix ... " << flush; }
   if (static_cond) { a->EnableStaticCondensation(); }
   a->Assemble();

   HypreParMatrix A;
   Vector B, X;
   a->FormLinearSystem(ess_tdof_list, x, *b, A, X, B);
   if (myid == 0)
   {
      cout << "done." << endl;
      cout << "Size of linear system: " << A.GetGlobalNumRows() << endl;
   }

   // 13. Define and apply a parallel PCG solver for A X = B with the BoomerAMG
   //     preconditioner from hypre.
   if (amgx_lib && strcmp(amgx_json_file,"") != 0)
   {
      AmgXSolver amgx;
      amgx.ReadParameters(amgx_json_file, AmgXSolver::EXTERNAL);

      if (amgx_mpi_teams)
      {
         // Forms MPI teams to load balance between MPI ranks and GPUs
         amgx.InitMPITeams(MPI_COMM_WORLD, ndevices);
      }
      else
      {
         // Assumes each MPI rank is paired with a GPU
         amgx.InitExclusiveGPU(MPI_COMM_WORLD);
      }

      amgx.SetOperator(A);
      amgx.ConfigureAs(AmgXSolver::PRECONDITIONER);

      CGSolver cg(MPI_COMM_WORLD);
      cg.SetRelTol(1e-12);
      cg.SetMaxIter(2000);
      cg.SetPrintLevel(1);
      cg.SetPreconditioner(amgx);
      cg.SetOperator(A);
      cg.Mult(B, X);

      // Release MPI communicators and resources created by AmgX
      amgx.Finalize();
    }else {
     HypreBoomerAMG *amg = new HypreBoomerAMG(A);
     if (amg_elast && !a->StaticCondensationIsEnabled())
       {
         amg->SetElasticityOptions(fespace);
       }
     else
       {
         amg->SetSystemsOptions(dim, reorder_space);
       }
     HyprePCG *pcg = new HyprePCG(A);
     pcg->SetTol(1e-8);
     pcg->SetMaxIter(500);
     pcg->SetPrintLevel(2);
     pcg->SetPreconditioner(*amg);
     pcg->Mult(B, X);

     delete pcg;
     delete amg;
   }

   // 14. Recover the parallel grid function corresponding to X. This is the
   //     local finite element solution on each processor.
   a->RecoverFEMSolution(X, *b, x);

   // 15. For non-NURBS meshes, make the mesh curved based on the finite element
   //     space. This means that we define the mesh elements through a fespace
   //     based transformation of the reference element.  This allows us to save
   //     the displaced mesh as a curved mesh when using high-order finite
   //     element displacement field. We assume that the initial mesh (read from
   //     the file) is not higher order curved mesh compared to the chosen FE
   //     space.
   if (!use_nodal_fespace)
   {
      pmesh->SetNodalFESpace(fespace);
   }

   // 16. Save in parallel the displaced mesh and the inverted solution (which
   //     gives the backward displacements to the original grid). This output
   //     can be viewed later using GLVis: "glvis -np <np> -m mesh -g sol".
   {
      GridFunction *nodes = pmesh->GetNodes();
      *nodes += x;
      x *= -1;

      ostringstream mesh_name, sol_name;
      mesh_name << "mesh." << setfill('0') << setw(6) << myid;
      sol_name << "sol." << setfill('0') << setw(6) << myid;

      ofstream mesh_ofs(mesh_name.str().c_str());
      mesh_ofs.precision(8);
      pmesh->Print(mesh_ofs);

      ofstream sol_ofs(sol_name.str().c_str());
      sol_ofs.precision(8);
      x.Save(sol_ofs);
   }

   // 17. Send the above data by socket to a GLVis server.  Use the "n" and "b"
   //     keys in GLVis to visualize the displacements.
   if (visualization)
   {
      char vishost[] = "localhost";
      int  visport   = 19916;
      socketstream sol_sock(vishost, visport);
      sol_sock << "parallel " << num_procs << " " << myid << "\n";
      sol_sock.precision(8);
      sol_sock << "solution\n" << *pmesh << x << flush;
   }

   // 18. Free the used memory.
   delete a;
   delete b;
   if (fec)
   {
      delete fespace;
      delete fec;
   }
   delete pmesh;

   MPI_Finalize();

   return 0;
}

@nghiakvnvsd regarding your second question, the recipe to create a preconditioner using AMGX is to choose a smoother and apply a few iterations. When you say it didn't work do you mean it didn't converge?

Here is another example you can try:

{                                                                                                                                                                                      
    "config_version": 2,                                                                                                                                                               
    "solver": {                                                                                                                                                                        
        "algorithm": "AGGREGATION",                                                                                                                                                    
        "solver": "AMG",                                                                                                                                                               
        "smoother": "JACOBI_L1",                                                                                                                                                       
        "presweeps": 1,                                                                                                                                                                
        "selector": "SIZE_2",                                                                                                                                                          
        "coarsest_sweeps": 2,                                                                                                                                                          
        "max_iters": 1,                                                                                                                                                                
        "scope": "main",                                                                                                                                                               
        "max_levels": 1000,                                                                                                                                                            
        "postsweeps": 1,                                                                                                                                                               
        "tolerance": 0.0,                                                                                                                                                              
        "norm": "L1",                                                                                                                                                                  
        "cycle": "V"                                                                                                                                                                   
    }                                                                                                                                                                                  
}

Another option could be to use: MULTICOLOR ILU, or POLYNOMIAL smoothing

@artv3 thank you for your code of 2p and your advice. This is useful for me!!! Last question, I would like to ask you. I used to read the paper comparing the conditioning Hypre and AMGX. When I run the example 2p, the BoomerAMG preconditioner is very good for the problem. May you recommend me some preconditioners of AMGX good or better than BoomerAMG? It is very good if you have the file json. Sorry to bother you. I really appreciate your help. Thank you very much!

@artv3 thank you for your code of 2p and your advice. This is useful for me!!! Last question, I would like to ask you. I used to read the paper comparing the conditioning Hypre and AMGX. When I run the example 2p, the BoomerAMG preconditioner is very good for the problem. May you recommend me some preconditioners of AMGX good or better than BoomerAMG? It is very good if you have the file json. Sorry to bother you. I really appreciate your help. Thank you very much!

Great glad I can help. The integration of AmgX into MFEM is still fairly new and I'm not sure if we have tried it for elasticity problems yet (aside from here). I'm not 100% sure on what boomer is doing and if it can be done with AmgX yet, but as we learn more we can definitely add an example 2p with AmgX. Still early days.

Hi @nghiakvnvsd,

For the parallel case we have some options, we can run with what I call MPI teams where teams of MPI ranks consolidate their data into a GPU or where we run with 1 MPI rank per GPU. To use AMGX as a preconditioner as in the serial example please use the following branch #1904 (I noticed we were missing an option to configure AMGX as preconditioner when reading external json files) and see the code listing below.

//                       MFEM Example 2 - Parallel Version
//
// Compile with: make ex2p
//
// Sample runs:  mpirun -np 4 ex2p -m ../data/beam-tri.mesh
//               mpirun -np 4 ex2p -m ../data/beam-quad.mesh
//               mpirun -np 4 ex2p -m ../data/beam-tet.mesh
//               mpirun -np 4 ex2p -m ../data/beam-hex.mesh
//               mpirun -np 4 ex2p -m ../data/beam-wedge.mesh
//               mpirun -np 4 ex2p -m ../data/beam-tri.mesh -o 2 -sys
//               mpirun -np 4 ex2p -m ../data/beam-quad.mesh -o 3 -elast
//               mpirun -np 4 ex2p -m ../data/beam-quad.mesh -o 3 -sc
//               mpirun -np 4 ex2p -m ../data/beam-quad-nurbs.mesh
//               mpirun -np 4 ex2p -m ../data/beam-hex-nurbs.mesh
//
// Description:  This example code solves a simple linear elasticity problem
//               describing a multi-material cantilever beam.
//
//               Specifically, we approximate the weak form of -div(sigma(u))=0
//               where sigma(u)=lambda*div(u)*I+mu*(grad*u+u*grad) is the stress
//               tensor corresponding to displacement field u, and lambda and mu
//               are the material Lame constants. The boundary conditions are
//               u=0 on the fixed part of the boundary with attribute 1, and
//               sigma(u).n=f on the remainder with f being a constant pull down
//               vector on boundary elements with attribute 2, and zero
//               otherwise. The geometry of the domain is assumed to be as
//               follows:
//
//                                 +----------+----------+
//                    boundary --->| material | material |<--- boundary
//                    attribute 1  |    1     |    2     |     attribute 2
//                    (fixed)      +----------+----------+     (pull down)
//
//               The example demonstrates the use of high-order and NURBS vector
//               finite element spaces with the linear elasticity bilinear form,
//               meshes with curved elements, and the definition of piece-wise
//               constant and vector coefficient objects. Static condensation is
//               also illustrated.
//
//               We recommend viewing Example 1 before viewing this example.

#include "mfem.hpp"
#include <fstream>
#include <iostream>

using namespace std;
using namespace mfem;

int main(int argc, char *argv[])
{
   // 1. Initialize MPI.
   int num_procs, myid;
   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
   MPI_Comm_rank(MPI_COMM_WORLD, &myid);

   // 2. Parse command-line options.
   const char *mesh_file = "../../data/beam-tri.mesh";
   int order = 1;
   bool static_cond = false;
   bool visualization = 1;
   bool amg_elast = 0;
   bool reorder_space = false;
   bool amgx_lib = true;
   bool amgx_mpi_teams = false;
   const char* amgx_json_file = ""; // JSON file for AmgX
   int ndevices = 1;

   OptionsParser args(argc, argv);
   args.AddOption(&mesh_file, "-m", "--mesh",
                  "Mesh file to use.");
   args.AddOption(&order, "-o", "--order",
                  "Finite element order (polynomial degree).");
   args.AddOption(&amg_elast, "-elast", "--amg-for-elasticity", "-sys",
                  "--amg-for-systems",
                  "Use the special AMG elasticity solver (GM/LN approaches), "
                  "or standard AMG for systems (unknown approach).");
   args.AddOption(&static_cond, "-sc", "--static-condensation", "-no-sc",
                  "--no-static-condensation", "Enable static condensation.");
   args.AddOption(&visualization, "-vis", "--visualization", "-no-vis",
                  "--no-visualization",
                  "Enable or disable GLVis visualization.");
   args.AddOption(&reorder_space, "-nodes", "--by-nodes", "-vdim", "--by-vdim",
                  "Use byNODES ordering of vector space instead of byVDIM");

   args.AddOption(&amgx_lib, "-amgx", "--amgx-lib", "-no-amgx",
                  "--no-amgx-lib", "Use AmgX in example.");
   args.AddOption(&amgx_json_file, "--amgx-file", "--amgx-file",
                  "AMGX solver config file (overrides --amgx-solver, --amgx-verbose)");
   args.AddOption(&amgx_mpi_teams, "--amgx-mpi-teams", "--amgx-mpi-teams",
                  "--amgx-mpi-gpu-exclusive", "--amgx-mpi-gpu-exclusive",
                  "Create MPI teams when using AmgX to load balance between ranks and GPUs.");

   args.Parse();
   if (!args.Good())
   {
      if (myid == 0)
      {
         args.PrintUsage(cout);
      }
      MPI_Finalize();
      return 1;
   }
   if (myid == 0)
   {
      args.PrintOptions(cout);
   }

   // 3. Read the (serial) mesh from the given mesh file on all processors.  We
   //    can handle triangular, quadrilateral, tetrahedral, hexahedral, surface
   //    and volume meshes with the same code.
   Mesh *mesh = new Mesh(mesh_file, 1, 1);
   int dim = mesh->Dimension();

   if (mesh->attributes.Max() < 2 || mesh->bdr_attributes.Max() < 2)
   {
      if (myid == 0)
         cerr << "\nInput mesh should have at least two materials and "
              << "two boundary attributes! (See schematic in ex2.cpp)\n"
              << endl;
      MPI_Finalize();
      return 3;
   }

   // 4. Select the order of the finite element discretization space. For NURBS
   //    meshes, we increase the order by degree elevation.
   if (mesh->NURBSext)
   {
      mesh->DegreeElevate(order, order);
   }

   // 5. Refine the serial mesh on all processors to increase the resolution. In
   //    this example we do 'ref_levels' of uniform refinement. We choose
   //    'ref_levels' to be the largest number that gives a final mesh with no
   //    more than 1,000 elements.
   {
      int ref_levels =
         (int)floor(log(1000./mesh->GetNE())/log(2.)/dim);
      for (int l = 0; l < ref_levels; l++)
      {
         mesh->UniformRefinement();
      }
   }

   // 6. Define a parallel mesh by a partitioning of the serial mesh. Refine
   //    this mesh further in parallel to increase the resolution. Once the
   //    parallel mesh is defined, the serial mesh can be deleted.
   ParMesh *pmesh = new ParMesh(MPI_COMM_WORLD, *mesh);
   delete mesh;
   {
      int par_ref_levels = 1;
      for (int l = 0; l < par_ref_levels; l++)
      {
         pmesh->UniformRefinement();
      }
   }

   // 7. Define a parallel finite element space on the parallel mesh. Here we
   //    use vector finite elements, i.e. dim copies of a scalar finite element
   //    space. We use the ordering by vector dimension (the last argument of
   //    the FiniteElementSpace constructor) which is expected in the systems
   //    version of BoomerAMG preconditioner. For NURBS meshes, we use the
   //    (degree elevated) NURBS space associated with the mesh nodes.
   FiniteElementCollection *fec;
   ParFiniteElementSpace *fespace;
   const bool use_nodal_fespace = pmesh->NURBSext && !amg_elast;
   if (use_nodal_fespace)
   {
      fec = NULL;
      fespace = (ParFiniteElementSpace *)pmesh->GetNodes()->FESpace();
   }
   else
   {
      fec = new H1_FECollection(order, dim);
      if (reorder_space)
      {
         fespace = new ParFiniteElementSpace(pmesh, fec, dim, Ordering::byNODES);
      }
      else
      {
         fespace = new ParFiniteElementSpace(pmesh, fec, dim, Ordering::byVDIM);
      }
   }
   HYPRE_Int size = fespace->GlobalTrueVSize();
   if (myid == 0)
   {
      cout << "Number of finite element unknowns: " << size << endl
           << "Assembling: " << flush;
   }

   // 8. Determine the list of true (i.e. parallel conforming) essential
   //    boundary dofs. In this example, the boundary conditions are defined by
   //    marking only boundary attribute 1 from the mesh as essential and
   //    converting it to a list of true dofs.
   Array<int> ess_tdof_list, ess_bdr(pmesh->bdr_attributes.Max());
   ess_bdr = 0;
   ess_bdr[0] = 1;
   fespace->GetEssentialTrueDofs(ess_bdr, ess_tdof_list);

   // 9. Set up the parallel linear form b(.) which corresponds to the
   //    right-hand side of the FEM linear system. In this case, b_i equals the
   //    boundary integral of f*phi_i where f represents a "pull down" force on
   //    the Neumann part of the boundary and phi_i are the basis functions in
   //    the finite element fespace. The force is defined by the object f, which
   //    is a vector of Coefficient objects. The fact that f is non-zero on
   //    boundary attribute 2 is indicated by the use of piece-wise constants
   //    coefficient for its last component.
   VectorArrayCoefficient f(dim);
   for (int i = 0; i < dim-1; i++)
   {
      f.Set(i, new ConstantCoefficient(0.0));
   }
   {
      Vector pull_force(pmesh->bdr_attributes.Max());
      pull_force = 0.0;
      pull_force(1) = -1.0e-2;
      f.Set(dim-1, new PWConstCoefficient(pull_force));
   }

   ParLinearForm *b = new ParLinearForm(fespace);
   b->AddBoundaryIntegrator(new VectorBoundaryLFIntegrator(f));
   if (myid == 0)
   {
      cout << "r.h.s. ... " << flush;
   }
   b->Assemble();

   // 10. Define the solution vector x as a parallel finite element grid
   //     function corresponding to fespace. Initialize x with initial guess of
   //     zero, which satisfies the boundary conditions.
   ParGridFunction x(fespace);
   x = 0.0;

   // 11. Set up the parallel bilinear form a(.,.) on the finite element space
   //     corresponding to the linear elasticity integrator with piece-wise
   //     constants coefficient lambda and mu.
   Vector lambda(pmesh->attributes.Max());
   lambda = 1.0;
   lambda(0) = lambda(1)*50;
   PWConstCoefficient lambda_func(lambda);
   Vector mu(pmesh->attributes.Max());
   mu = 1.0;
   mu(0) = mu(1)*50;
   PWConstCoefficient mu_func(mu);

   ParBilinearForm *a = new ParBilinearForm(fespace);
   a->AddDomainIntegrator(new ElasticityIntegrator(lambda_func, mu_func));

   // 12. Assemble the parallel bilinear form and the corresponding linear
   //     system, applying any necessary transformations such as: parallel
   //     assembly, eliminating boundary conditions, applying conforming
   //     constraints for non-conforming AMR, static condensation, etc.
   if (myid == 0) { cout << "matrix ... " << flush; }
   if (static_cond) { a->EnableStaticCondensation(); }
   a->Assemble();

   HypreParMatrix A;
   Vector B, X;
   a->FormLinearSystem(ess_tdof_list, x, *b, A, X, B);
   if (myid == 0)
   {
      cout << "done." << endl;
      cout << "Size of linear system: " << A.GetGlobalNumRows() << endl;
   }

   // 13. Define and apply a parallel PCG solver for A X = B with the BoomerAMG
   //     preconditioner from hypre.
   if (amgx_lib && strcmp(amgx_json_file,"") != 0)
   {
      AmgXSolver amgx;
      amgx.ReadParameters(amgx_json_file, AmgXSolver::EXTERNAL);

      if (amgx_mpi_teams)
      {
         // Forms MPI teams to load balance between MPI ranks and GPUs
         amgx.InitMPITeams(MPI_COMM_WORLD, ndevices);
      }
      else
      {
         // Assumes each MPI rank is paired with a GPU
         amgx.InitExclusiveGPU(MPI_COMM_WORLD);
      }

      amgx.SetOperator(A);
      amgx.ConfigureAs(AmgXSolver::PRECONDITIONER);

      CGSolver cg(MPI_COMM_WORLD);
      cg.SetRelTol(1e-12);
      cg.SetMaxIter(2000);
      cg.SetPrintLevel(1);
      cg.SetPreconditioner(amgx);
      cg.SetOperator(A);
      cg.Mult(B, X);

      // Release MPI communicators and resources created by AmgX
      amgx.Finalize();
    }else {
     HypreBoomerAMG *amg = new HypreBoomerAMG(A);
     if (amg_elast && !a->StaticCondensationIsEnabled())
       {
         amg->SetElasticityOptions(fespace);
       }
     else
       {
         amg->SetSystemsOptions(dim, reorder_space);
       }
     HyprePCG *pcg = new HyprePCG(A);
     pcg->SetTol(1e-8);
     pcg->SetMaxIter(500);
     pcg->SetPrintLevel(2);
     pcg->SetPreconditioner(*amg);
     pcg->Mult(B, X);

     delete pcg;
     delete amg;
   }

   // 14. Recover the parallel grid function corresponding to X. This is the
   //     local finite element solution on each processor.
   a->RecoverFEMSolution(X, *b, x);

   // 15. For non-NURBS meshes, make the mesh curved based on the finite element
   //     space. This means that we define the mesh elements through a fespace
   //     based transformation of the reference element.  This allows us to save
   //     the displaced mesh as a curved mesh when using high-order finite
   //     element displacement field. We assume that the initial mesh (read from
   //     the file) is not higher order curved mesh compared to the chosen FE
   //     space.
   if (!use_nodal_fespace)
   {
      pmesh->SetNodalFESpace(fespace);
   }

   // 16. Save in parallel the displaced mesh and the inverted solution (which
   //     gives the backward displacements to the original grid). This output
   //     can be viewed later using GLVis: "glvis -np <np> -m mesh -g sol".
   {
      GridFunction *nodes = pmesh->GetNodes();
      *nodes += x;
      x *= -1;

      ostringstream mesh_name, sol_name;
      mesh_name << "mesh." << setfill('0') << setw(6) << myid;
      sol_name << "sol." << setfill('0') << setw(6) << myid;

      ofstream mesh_ofs(mesh_name.str().c_str());
      mesh_ofs.precision(8);
      pmesh->Print(mesh_ofs);

      ofstream sol_ofs(sol_name.str().c_str());
      sol_ofs.precision(8);
      x.Save(sol_ofs);
   }

   // 17. Send the above data by socket to a GLVis server.  Use the "n" and "b"
   //     keys in GLVis to visualize the displacements.
   if (visualization)
   {
      char vishost[] = "localhost";
      int  visport   = 19916;
      socketstream sol_sock(vishost, visport);
      sol_sock << "parallel " << num_procs << " " << myid << "\n";
      sol_sock.precision(8);
      sol_sock << "solution\n" << *pmesh << x << flush;
   }

   // 18. Free the used memory.
   delete a;
   delete b;
   if (fec)
   {
      delete fespace;
      delete fec;
   }
   delete pmesh;

   MPI_Finalize();

   return 0;
}

I have just tried running the code, I run: mpirun -n 4 ./ex2p --amgx-file precon.json --amgx-mpi-teams -o 3. Something happens and I do not know what it is. I set the "cg.SetRelTol(1e-12)" but the result:

I believe the inner product being reported is the square of the tolerance. I recommend using the JACOBI_L1 smoother and increase the number of iterations in the json file to speed up convergence.

I believe the inner product being reported is the square of the tolerance. I recommend using the JACOBI_L1 smoother and increase the number of iterations in the json file to speed up convergence.

It seems convergence slower than No MPI. I do not know why :-( I am trying to find a multigrid solution for elasticity equation on multiple GPUs. Maybe, I will be looking forward to new advances in the future. Thank you very much for help!

Depending on how hard the elasticity problem is, AMG may or may not work well for it (regardless of the GPU aspect). For example nearly-incompressible elasticity is notoriously hard got multigrid.

If the Poisson ratio is well-behaved, say less than 0.4 and the deformation is relatively simple, than both classical AMG (from BoomerAMG or AmgX) and smoothed aggregation (from AmgX) should work decently.

One thing to check is the ordering of the degrees of freedom. Different solvers may expect different ordering (byVDIM or byNODES) -- see the logic in Example 2p.

@artv3 I recently tried using several tetrahedral mfem meshes without partial assembly ( see ballmesh) with the examples/amgx/ex1p.cpp example and if you select order (-o 2), the preconditioner seems to fail. First order tets seem fine. Hypre also works for all the combinations tested. Do you know if I need a special configuration for the second-order case? Thanks.

@jonwong12 thanks for reaching out. Was it the default settings that you tried? The integration is fairly new and we are still assessing settings/performance with AmgX. One thing to try is a different smoother in AmgX, using branch (https://github.com/mfem/mfem/pull/1904), example 1p can be modified to take in an external JSON configuration:

AmgXSolver amgx;                                                                                                                                               
amgx.ReadParameters(amgx_json_file, AmgXSolver::EXTERNAL);  
amgx.InitExclusiveGPU(MPI_COMM_WORLD);  
amgx.SetOperator(*A.As<HypreParMatrix>());                                                                                                                     
amgx.ConfigureAs(AmgXSolver::PRECONDITIONER);                                                                                                                  

CGSolver cg(MPI_COMM_WORLD);                                                                                                                                   
cg.SetRelTol(1e-12);                                                                                                                                           
cg.SetMaxIter(2000);                                                                                                                                           
cg.SetPrintLevel(1);                                                                                                                                           
cg.SetPreconditioner(amgx);                                                                                                                                    
cg.SetOperator(*A.As<HypreParMatrix>());                                                                                                                       
cg.Mult(B, X);

Using the following JACOBI_L1 smoother I was able to go up to order 4 using your mesh:

{                                                                                                                                                                    
    "config_version": 2,                                                                                                                                             
    "solver": {                                                                                                                                                      
        "solver": "AMG",                                                                                                                                             
        "smoother": "JACOBI_L1",                                                                                                                                     
        "presweeps": 1,                                                                                                                                              
        "interpolator": "D2",                                                                                                                                        
        "max_iters": 2,                                                                                                                                             
        "scope": "main",                                                                                                                                             
        "max_row_sum": 0.9,                                                                                                                                          
        "strength_threshold": 0.25,                                                                                                                                  
        "max_levels": 1000,                                                                                                                                          
        "postsweeps": 1,                                                                                                                                             
        "tolerance": 0.0,                                                                                                                                            
        "norm": "L1",                                                                                                                                                
        "cycle": "V"                                                                                                                                                 
    }                                                                                                                                                                
}

 lrun -n 8 ./ex1p -o 4 --amgx-file JACOBI_L1.json -d cuda 
Options used:
   --mesh ballmesh.mesh
   --order 4
   --no-static-condensation
   --no-partial-assembly
   --amgx-lib
   --amgx-file JACOBI_L1.json
   --amgx-mpi-gpu-exclusive
   --device cuda
   --visualization
   --gpus-per-node-in-teams-mode 1
Device configuration: cuda,cpu
Memory configuration: host-std,cuda
Number of finite element unknowns: 2829313
AMGX version 2.1.0.131-opensource
Built on Oct 27 2020, 15:02:10
Compiled with CUDA Runtime 10.1, using CUDA driver 10.2
Cannot read file as JSON object, trying as AMGX config
Converting config string to current config version
Parsing configuration string: exception_handling=1 ; 
Using Normal MPI (Hostbuffer) communicator...
Using Normal MPI (Hostbuffer) communicator...
   Iteration :   0  (B r, r) = 0.12083
....
   Iteration :  42  (B r, r) = 2.79164e-25
   Iteration :  43  (B r, r) = 8.28688e-26
Average reduction factor = 0.523631

@artv3 Thanks for the quick reply.

Yes. I was using default settings, but I will give your settings a shot.

Edit: Yes. That seems to have done the trick. Thanks.

Thanks for the timely response Arturo!

Hi @artv3 and @tzanio . I totally try to use all smoother to modify the ex2p by AMGX but it does not well than ex2 by AMGX (time computation). Actually, only single GPU by AMGX is enough good for my problem but the assembly time is very large. I would like to ask a question. Can I combine the assembly using multiple CPUs (like ex2p) and the solver AMGX (like ex2)? if yes, How can I? Thank you very much!

mfem / mfem

An error when using AMGX solver. #1900