project-asgard / asgard

MIT License
27 stars 20 forks source link

Crash on landau 1x3v #660

Closed stefan-schnake closed 4 months ago

stefan-schnake commented 9 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Current develop
  2. Build with cmake -DASGARD_RECOMMENDED_DEFAULTS=ON -DASGARD_IO_HIGHFIVE=ON -DASGARD_USE_CUDA=ON
  3. Change $\nu$ in pde_collisional_landau_1x3v.hpp to 1e-2
  4. Run with OMP_NUM_THREADS=8 nohup ./asgard -p landau_1x3v -d 3 -l "5 5 5 5" -m 5 -t 0.009817477042468 -x -n 5094 --wave_freq 4 --tol 1e-14 --max_adapt_levels "5 5 5 5" --adapt --thresh 1e-8 --inner_it 10 --memory 40000 > asgard_out.txt &

Expected behavior Crash uring 9th timestep. Error given is

terminate called after throwing an instance of 'std::bad_alloc' 
  what():  std::bad_alloc

System:

Additional context Add any other context about the problem here.

mkstoyanov commented 9 months ago

Trying to reproduce right now, but I have a dumb question. Have you tried to run a basic cuda problem? This error can happen if there is a mismatch between the cuda version installed and the cuda driver. Basically, the machine may just need a reboot.

mkstoyanov commented 9 months ago

Or it may be using up too much memory ... ugh!

mkstoyanov commented 9 months ago

It's running out of memory!

stefan-schnake commented 9 months ago

Trying to reproduce right now, but I have a dumb question. Have you tried to run a basic cuda problem? This error can happen if there is a mismatch between the cuda version installed and the cuda driver. Basically, the machine may just need a reboot.

Tried after reboot. Same error.

mkstoyanov commented 4 months ago

This is not a bug but the math algorithm using too much memory. It will be delegated to the block-global Kronmult, can close for now.