idefix-code / idefix

A fast finite volume code designed to run on many architectures, such as GPU, CPU and manycores, using Kokkos.
https://idefix.readthedocs.io/
Other
28 stars 19 forks source link

BUG: Illegal memory access error when using coarsening in X2 direction on GPU #226

Closed Anto6453 closed 7 months ago

Anto6453 commented 9 months ago

Describe the issue:

When using coarsening on X2 direction (corresponding to $\theta$ in spherical coordinates), I obtain an illegal memory access error when I try to run idefix on GPU (seen on A100 and RTX 2080 SUPER). The same configuration works on CPU.

In my case, I use a 2D-axisym spherical grid, and I divide my radial grid in one uniform grid in the inner domain, and one log grid in the outer domain. To avoid that my uniform grid in $r$ becomes too thin in the $\theta$ direction when it comes close to the center of the domain, in need to coars the grid in the $\theta$ direction.

My coarsening function :

void CoarsenFunction(DataBlock &data) {
<int> coarseningLevel = data.coarseningLevel[JDIR];
  IdefixArray1D<real> r = data.x[IDIR];
  real radiusCoars = radiusCoarsGlob;
  int coarsIncrement = coarsIncrementGlob;

  idefix_for("set_coarsening", 0, data.np_tot[KDIR], 0, data.np_tot[IDIR],
         KOKKOS_LAMBDA(int k,int i) {
           if (r(i) <= radiusCoars) {
             coarseningLevel(k,i) = 1 + log2(radiusCoars/r(i)) + coarsIncrement;
           }
           else {coarseningLevel(k,i) = 1;}
         });
  data.coarseningLevel[JDIR] = coarseningLevel;
}

where radiusCoars is the radius that delimits the uniform and the log grids.

The error seems to occure when the number of points become too big. For example :

It also seems to be triggered more easily when I increase the maximum coarsening level (via coarsIncrement in the coarsening function).

By looking at the log in debug mode, the error seems to appear when idefix is trying to coars the magnetic field while keeping divB equal to zero (in the second FLUID_CoarsenFlow_BXsn idefix_for loop in the src/fluid/coarsenFlow.hpp file).

Thanks in advance for your help.

Error message:

----> Profiler::Init...
----> ...returned
                                  .:HMMMMHn:.  ..:n..
                                .H*'``     `'%HM'''''!x.
         :x                    x*`           .(MH:    `#h.
        x.`M                   M>        :nMMMMMMMh.     `n.
         *kXk..                XL  nnx:.XMMMMMMMMMMML   .. 4X.
          )MMMMMx              'M   `^?M*MMMMMMMMMMMM:HMMMHHMM.
          MMMMMMMX              ?k    'X ..'*MMMMMMM.#MMMMMMMMMx
         XMMMMMMMX               4:    M:MhHxxHHHx`MMx`MMMMMMMMM>
         XM!`   ?M                `x   4MM'`''``HHhMMX  'MMMMMMMM
         4M      M                 `:   *>     `` .('MX   '*MMMM'
          MX     `X.nnx..                        ..XMx`     'M*X
           ?h.    ''^'*!Hx.     :Mf     xHMh  M**MMM      4L`
            `*Mx           `'*n.x. 4M>   :M` `` 'M    `       %
             '%                ``*MHMX   X>      !
            :!                    `#MM>  X>      `   :x
           :M                        ?M  `X     .  ..'M
           XX                       .!*X  `x   XM( MMx`h
          'M>::                        `M: `+  MMX XMM `:
          'M> M                         'X    'MMX ?MMk.Xx..
          'M> ?L                     ...:!     MMX.H**'MMMM*h
           M>  #L                  :!'`MM.    . X*`.xHMMMMMnMk.
           `!   #h.      :L           XM'*hxHMM*MhHMMMMMMMMMM'#h
           +     XMh:    4!      x   :f   MM'   `*MMMMMMMMMM%  `X
           M     Mf``tHhxHM      M>  4k xxX'      `#MMMMMMMf    `M .>
          :f     M   `MMMMM:     M>   M!MMM:         '*MMf'     'MH*
          !     Xf   'MMMMMX     `X   X>'h.`          :P*Mx.   .d*~..
        :M      X     4MMMMM>     !   X~ `Mh.      .nHL..M#'%nnMhH!'`
       XM      d>     'X`'**h     'h  M   ^'MMHH+*'`  ''''   `'**'
    %nxM>      *x+x.:. XL.. `k     `::X
:nMMHMMM:.  X>  Mn`*MMMMMHM: `:     ?MMn.
    `'**MML M>  'MMhMMMMMMMM  #      `M:^*x
         ^*MMttnnMMMMMMMMMMMH>.        M:.4X
                        `MMMM>X   (   .MMM:MM!   .
                          `'''4x.dX  +^ `''MMMMHM?L..
                                ``'           `'`'`'`

              Idefix version 2.0.04-00e3db93
              Built against Kokkos 30500

Main: Initialization stage.
----> Grid::Grid(Input)...
----> ...returned
----> GridHost::GridHost(Grid)...
----> ...returned
----> GridHost::MakeGrid...
----> ...returned
----> GridHost::SyncToDevice...
----> ...returned
----> DataBlock::DataBlock...
--------> GridHost::GridHost(Grid)...
--------> ...returned
--------> GridHost::SyncFromDevice...
--------> ...returned
--------> DataBlock::ExtractSubdomain...
------------> idefix_for(coordinates)...
------------> ...returned
------------> idefix_for(coordinates)...
------------> ...returned
------------> idefix_for(coordinates)...
------------> ...returned
--------> ...returned
--------> DataBlock::MakeGeometry()...
------------> idefix_for(init_coarsening)...
------------> ...returned
------------> idefix_for(Volumes)...
------------> ...returned
------------> idefix_for(GeometricalCentersX1)...
------------> ...returned
------------> idefix_for(GeometricalCentersX2)...
------------> ...returned
------------> idefix_for(GeometricalCentersX3)...
------------> ...returned
------------> idefix_for(AreaX1)...
------------> ...returned
------------> idefix_for(AreaX2)...
------------> ...returned
------------> idefix_for(AreaX3)...
------------> ...returned
--------> ...returned
--------> Dump::Init...
--------> ...returned
--------> GridHost::GridHost(Grid)...
--------> ...returned
--------> GridHost::SyncFromDevice...
--------> ...returned
--------> Fluid::Fluid...
------------> StateContainer::PushArray...
------------> ...returned
------------> StateContainer::PushArray...
------------> ...returned
------------> idefix_for(ComputePLMweights)...
------------> ...returned
------------> idefix_for(ComputePLMweights)...
------------> ...returned
------------> ConstrainedTransport::Init...
------------> ...returned
------------> Boundary::Boundary...
Phys MHD
------------> ...returned
--------> ...returned
--------> Gravity::Gravity...
------------> SelfGravity::Init...
----------------> Laplacian::Laplacian...
--------------------> Laplacian::InitInternalGrid...
------------------------> GridHost::GridHost(Grid)...
------------------------> ...returned
------------------------> GridHost::SyncFromDevice...
------------------------> ...returned
------------------------> idefix_for(InternalGridCopy)...
------------------------> ...returned
------------------------> idefix_for(InternalGridFill)...
------------------------> ...returned
------------------------> idefix_for(Volumes)...
------------------------> ...returned
------------------------> idefix_for(AreaX1)...
------------------------> ...returned
------------------------> idefix_for(AreaX2)...
------------------------> ...returned
------------------------> idefix_for(AreaX3)...
------------------------> ...returned
--------------------> ...returned
--------------------> Laplacian::InitPreconditioner...
------------------------> idefix_for(ResetPrecond)...
------------------------> ...returned
------------------------> idefix_for(InitPrecond)...
------------------------> ...returned
------------------------> idefix_for(InitPrecond)...
------------------------> ...returned
--------------------> ...returned
--------------------> Laplacian::PreComputeLaplacian...
------------------------> idefix_for(L_Factor)...
------------------------> ...returned
--------------------> ...returned
----------------> ...returned
----------------> idefix_for(InitDensity)...
----------------> ...returned
------------> ...returned
--------> ...returned
----> ...returned
----> TimeIntegrator::TimeIntegrator(Input...)...
--------> StateContainer::AllocateAs...
--------> ...returned
----> ...returned
----> Output::Output...
----> ...returned
Setup:: Coarsening is on.
----> Output::EnrollUserDefVariable...
----> ...returned
Main: initialisation finished.
-----------------------------------------------------------------------------
Input Parameters using input file idefix.ini:
-----------------------------------------------------------------------------
[Boundary]
        X1-beg          userdef
        X1-end          outflow
        X2-beg          axis
        X2-end          reflective
        X3-beg          periodic
        X3-end          periodic
[Gravity]
        gravCst         39.476926408897626
        potential               selfgravity
        skip            1
[Grid]
        ;               X1-grid 2       2e-4    196     u       1e-2    2640    l       5.e3    X2-grid 1       0.0     512       u       1.570796326794896
        X1-grid         2       2e-4    98      u       1e-2      1320        l     5.e3
        X2-grid         1       0.0     1024    u       1.570796326794896
        X3-grid         1       0.0     1       u       6.283185307179586
        coars_increment         1
        coarsening              static  X2
        radiusCoars             1e-2
[Hydro]
        csiso           userdef
        gamma           1.666667
        solver          hll
[Output]
        dmp             0.001
        dmp_dir         ./Try5_resartTry3_newBoundary/
        log             100
        uservar         PhiP    Cs      InvDt   dV      T       AX3
        vtk             0.001
        vtk_dir         ./Try5_resartTry3_newBoundary/
[SelfGravity]
        boundary-X1-beg         origin
        boundary-X1-end         nullpot
        boundary-X2-beg         axis
        boundary-X2-end         nullgrad
        boundary-X3-beg         periodic
        boundary-X3-end         periodic
        maxIter         10000
        skip            1
        solver          PBICGSTAB
        targetError             1e-4
[Setup]
        Mc              1.
        T0              10.
        alpha           0.25
        beta            0.0
        mu              4.
[TimeIntegrator]
        CFL             0.8
        CFL_max_var             1.1
        check_nan               100
        first_dt                1.e-7
        max_runtime             96
        maxdivB         1e-2
        nstages         2
        tstop           24000.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
Input: Compiled with DOUBLE PRECISION arithmetic.
Input: DIMENSIONS=2.
Input: COMPONENTS=3.
Input: Kokkos CUDA target ENABLED.
Grid: full grid size is 
         Direction X1: userdef  0.01....4096....5000    outflow
         Direction X2: axis     0....1024....1.5708     reflective
Grid: static grid coarsening enabled in direction(s) X2 
Hydro: solving MHD equations.
Hydro: Using EXPERIMENTAL vector potential formulation for MHD.
Hydro: Reconstruction: 2nd order (PLM Van Leer)
EquationOfState: isothermal with user-defined cs function.
RiemannSolver: hll (MHD).
ConstrainedTransport: Using UCT_CONTACT averaging scheme.
Axis: Axis regularisation ENABLED.
Axis: Full 2pi regularisation around the axis.
Gravity: ENABLED.
Gravity: G=39.4769.
Gravity: self-gravity ENABLED.
SelfGravity: Using preconditionned BICGSTAB solver.
SelfGravity: using origin boundary with 280 additional radial points.
----> Bicgstab::ShowConfig...
Bicgstab: TargetError: 0.0001
Bicgstab: Maximum iterations: 10000
----> ...returned
TimeIntegrator: using 2nd Order (RK2) integrator.
TimeIntegrator: Using adaptive dt with CFL=0.8 .
TimeIntegrator: will stop after 96 hours.
Main: Creating initial conditions.
----> Setup::Initflow...
--------> DataBlockHost::DataBlockHost(DataBlock)...
--------> ...returned
InnitFlow:: Sarting a new collapse.
--------> DataBlockHost::SyncToDevice()...
--------> ...returned
----> ...returned
----> ConstrainedTransport::ComputeMagFieldfromA...
--------> idefix_for(ComputeMagFieldFromA)...
--------> ...returned
--------> Fluid::CoarsenMagField...
------------> idefix_for(FLUID_CoarsenFlow_BXsn)...
------------> ...returned
------------> idefix_for(FLUID_CoarsenFlow_BXsn)...
------------> ...returned
--------> ...returned
----> ...returned
----> DataBlock::ComputeGridCoarseningLevels...
--------> User-defined Coarsening function...
------------> idefix_for(set_coarsening)...
------------> ...returned
--------> ...returned
--------> DataBlock::CheckCoarseningLevels()...
------------> DataBlockHost::DataBlockHost(DataBlock)...
------------> ...returned
------------> DataBlockHost::SyncFromDevice()...
------------> ...returned
--------> ...returned
----> ...returned
----> Fluid::CoarsenFlow...
--------> idefix_for(FLUID_CoarsenFlow)...
--------> ...returned
----> ...returned
----> Fluid::CoarsenMagField...
--------> idefix_for(FLUID_CoarsenFlow_BXsn)...
--------> ...returned
--------> idefix_for(FLUID_CoarsenFlow_BXsn)...
terminate called after throwing an instance of 'std::runtime_error'
  what():  cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered /home/aborderi/src/idefix/src/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:150
Traceback functionality not available

Aborted

runtime information:

Idefix version 2.0.04-00e3db93 Master branch

Kokkos 30500

Seen on GPU A100 and RTX 2080 SUPER

glesur commented 8 months ago

Hi @Anto6453 , thanks for the report. In order to reproduce this, could you send me the full setup you are using? (i.e. setup.cpp+idefix.ini) (can be by email if your don't want your setup to appear publicly).

glesur commented 8 months ago

Problem is that gridCoarsening tries to reconstruct the normal component of B from the divergence of the two tangential components of B stored in Vs. When DIMENSIONS<3, one of these components is not defined (because it's not stored on cell faces), resulting in a segfault (as it turns out, also on CPUs). The fix #230 will be part of Idefix v2.0.05 release.

glesur commented 7 months ago

v2.0.05 released with this bug fixed.