ucns3d-team / UCNS3D

Unstructured Compressible Navier Stokes 3D code (UCNS3D)
https://ucns3d.com
GNU General Public License v3.0
238 stars 87 forks source link

Code gets stuck when running in multi-node HPC #54

Closed SRkumar97 closed 1 year ago

SRkumar97 commented 1 year ago

Hello, I am trying to run the code in an HPC environment. The code was compiled with the Intel Fortran and MKL.

The HPC I am using is CPU architecture based; each node has 64 cores. Hence each node can handle upto np=64.

The code runs and outputs all results successfully when run in an single node. A transient iLES case could be run with all output files being written at regular without any issues.

However, I am facing issue when I try to run the same case in multi-nodes, say n=2 with np=64 in each, yielding total np=128 The code begins, with the UCNS3D is Running line showing up; however files are written only for the 0th time-step. In the history.txt file, the code doesn't write anything beyond this 0th iteration/step. There are no errors popping up, no MPI aborts or issues popping in the terminal. The code appears to run for several hours, but no calculation occurs. Neither the FORCE.dat nor MOMENT.dat get written.

I have even tried setting export OMP_NUM_THREADS=1 before submitting the multi-node MPI job, but to vain. It is unclear why the code doesn't begin calculation for the same case in multi-node.

Attached is the history.txt file for your reference. Request for your help. history.txt

SRkumar97 commented 1 year ago

The issue got resolved after setting chmod +x to the ucns3d_p executable, in the job script. This needs to be done for every testcase being run.