nbody6ppgpu / Nbody6PPGPU-beijing

This is Nbody6++GPU, an N-body star cluster simulation code, maintained by Rainer Spurzem and team.
14 stars 16 forks source link

Boundary error on array Ktype with example #22

Closed mdelorme closed 3 months ago

mdelorme commented 3 months ago

Hi, I've been trying to compile out of the box and run the example N1k_1Myr.inp but it seems to be crashing.

By default, I get a segfault right after the following output :

  NA-NS=          85         168         530        1203          24         132          99          40         216      320000       15872       22528

If I compile in debug mode, the crash happens sooner :

At line 304 of file ../src/Main/instar.F
Fortran runtime error: Index '15' of dimension 2 of array 'ktype' above upper bound of 14

I'm not entirely certain both crashes are linked. However, it seems that there's indeed a problem with instar.F considering on line 304 we iterate through k from 0 to 15 while ktype has been declare in commons6.h as an array with indices going from 0 to 14.

The same problem appears with N10k_nodat10.inp.

Here are some information on my system :

OS : Ubuntu 22.04 Compiler : Gnu fortran 11.4 Cuda : 11.5 Compiled with : AVX, OpenMP, MPI, Cuda

I am attaching the logs for the release and the debug builds.

Thanks in advance

log_nb6_release.txt

log_nb6_debug.txt

mdelorme commented 3 months ago

Addendum : Debug still crashes with the same error, however release seems to be working when I disable openmp and AVX.

I realize that this specific problem might be related to the known problems 3 and 4 of the README

kaiwu-astro commented 3 months ago

Hi mdelorme,

Your OS, compiler and CUDA all look good. For new users, usually segmentation fault is caused by

  1. not setting ulimit -s unlimited
  2. or, not setting export OMP_STACKSIZE=4096M. Maybe make sure you have set both of them right before your run (right before ./nbody6++.[something] command)

The debug mode (compile with --debug) haven't been maintained for very long time. Sorry about that. Maybe do not compile with it.

Something else you may try: which version are you using? the stable branch or the dev branch? maybe try the other one

Next time maybe attach your terminal history - the commands to reproduce the problem, also attach config.log file, so we can help you better.

Regards, Kai

mdelorme commented 3 months ago

Hi Kai,

Indeed, by setting export OMP_STACKSIZE=4096M the release version does not segfault anymore ! Thanks for the tip, I'll add this for all my release runs !

However, the initial problem with Ktype seems to be holding when compiling in debug mode (where the arrays boundaries are checked, I guess). I still believe that in commons6.h ktype should be declared as KTYPE(0:15,0:15) since KSTAR can go up to 15 (in coal.f).

kaiwu-astro commented 3 months ago

Hi Delorme,

Glad to know that your problem is solved!

Thanks for the suggestion. I will record it and investigate further.

Regards, Kai