Open claresinger opened 4 years ago
Does this happen every time for a given random seed?
It seems that for some rare conditions the pressure solver has trouble with finding a solution. If it was a bug in e.g. boundary conditions, I don't see why it would depend on the seed.
To make it easier for the pressure solver, you can try to:
This runtime_error is thrown when pressure solver needs more than 10000 iterations. You could test increasing the number of iterations, which is hardcoded in libmpdata++/solvers/detail/mpdata_rhs_vip_prs_common.hpp
I also just got stuck in pressure solver. I was running dycoms 2D with rng seed = 42. Will try again now with the same setup to see if it's deterministic or random
The below command was stuck 4 times on 2 different GPU nodes.
@pdziekan - could you check if you will also get stuck on your machine? If yes then rng_seed=42 is a good candidate to debug from.
case = "dycoms_rf02"
nx = "129"
ny = "0"
nz = "301"
dt = "1"
nt = "21600"
spinup = "3600"
outfreq = "3600"
backend = "CUDA"
outdir = "out_test_lgrngn"
rng_seed = "42"
micro = "lgrngn"
sd_conc = "40"
sstp_cond = "10"
sstp_coal = "10"
cmd = "OMP_NUM_THREADS=1 ./src/bicycles --outdir="+outdir+" --case="+case+\
" --nx="+nx+" --ny=0 --nz="+nz+" --dt="+dt+" --spinup="+spinup+\
" --nt="+nt+" --micro="+micro+" --outfreq="+outfreq+\
" --backend="+backend+" --rng_seed="+rng_seed+" --sd_conc="+sd_conc+\
" --sstp_cond="+sstp_cond+" --sstp_coal="+sstp_coal
print "running " + cmd
os.system(cmd)
The below command was stuck 4 times on 2 different GPU nodes.
@pdziekan - could you check if you will also get stuck on your machine? If yes then rng_seed=42 is a good candidate to debug from.
The same command but with rng_seed = 44 does not get stuck
Not sure if its the same issue. This combination gets stuck after time step = 9000 but I don't get any errors from the pressure solver.
case = "dycoms_rf02"
nx = "129"
ny = "0"
nz = "301"
dt = "1"
nt = "25200"
spinup = "3600"
outfreq = "900"
backend = "CUDA"
rng_seed = "48"
outdir = "out_test_lgrngn_"+rng_seed
micro = "lgrngn"
sd_conc = "512"
sstp_cond = "10"
sstp_coal = "10"
cmd = "OMP_NUM_THREADS=1 ./src/bicycles --outdir="+outdir+" --case="+case+\
" --nx="+nx+" --ny=0 --nz="+nz+" --dt="+dt+" --spinup="+spinup+\
" --nt="+nt+" --micro="+micro+" --outfreq="+outfreq+\
" --backend="+backend+" --rng_seed="+rng_seed+" --sd_conc="+sd_conc+\
" --sstp_cond="+sstp_cond+" --sstp_coal="+sstp_coal+\
" --gccn=1"
Not sure if its the same issue. This combination gets stuck after time step = 9000 but I don't get any errors from the pressure solver.
The same with rng_seed=13
I somewhat frequently get the error of stuck in pressure solver (error message below). If I run the same simulation with a different random seed each time this will happen about every 20 runs. Do you know why this might happen? Could it be a glitch on the hpc I'm using and not a bug in the code?