project-asgard / asgard

MIT License
27 stars 20 forks source link

Crash at long time with landau 1x3v #647

Closed stefan-schnake closed 2 months ago

stefan-schnake commented 10 months ago

Describe the bug Landau 1x3v seems to be crashing. Debug output is

asgard: /home/7r9/asgard/src/asgard_vector.hpp:1021: asgard::fk::vector<P, asgard::mem_type::owner, resrc>& asgard::fk::vector<P, mem, resrc>::resize(int) [with asgard::mem_type m_ = asgard::mem_type::owner; <template-parameter-2-2> = void; P = double; asgard::mem_type mem = asgard::mem_type::owner; asgard::resource resrc = asgard::resource::host]: Assertion `new_size >= 0' failed.

To Reproduce Steps to reproduce the behavior:

  1. Change nu in https://github.com/project-asgard/asgard/blob/develop/src/pde/pde_collisional_landau_1x3v.hpp#L43 to 1e-2.
  2. Acquire asgard_wavelet_1826.h5 on andremarie (look on slack for filepath).
  3. Run OMP_NUM_THREADS=8 ./asgard -p landau_1x3v -d 3 -l "4 5 5 5" -x -t 0.019634954084936 -n 2547 -m 5 --kron-mode sparse --wave_freq 2 --inner_it 20 --tol 1e-14 --adapt --max_adapt_levels "4 5 5 5" --thresh 1e-6 --memory 40000 --restart asgard_wavelet_1826.h5
  4. Crash

System:

mkstoyanov commented 2 months ago

Do you remember which kronmult was used here? Global knonmult that explicitly indexes all entries uses lots and lots of memory. The memory limit is no implemented (don't know if I even can, I'm tied to the cuSparse API).

Indexing always catches me off guard, the code is faster with it until we run out of RAM. The local kronmult or the more recent (cpu only) block-global should fix this. If so, we can close it as it will be addressed in block-global kronmult for the GPU.

stefan-schnake commented 2 months ago

I don't recall, but Im fine closing.