Open SpandanCh opened 3 years ago
Never happened to me, but it looks bad! Is the issue the same as the linked one, with infinite processes spawning? In that case (and maybe other cases too), have you tried upgrading your OpenMPI version?
@SpandanCh could you check what is the OpenMPI version installed?
@SpandanCh could you check what is the OpenMPI version installed?
It's 2.1.1
looks like nowadays they go all the way to 5.x, and back when I was working on this 3.x was already around - give updating a try maybe?
that being said, it's my only lead, and I have no clue what else could be wrong here
Ok, thanks. I will try updating
@vlas-sokolov, I am running the code on a synthetic cube, and I noticed that after around 6-7 hours, some processes start popping up, each of which uses a lot of memory (~ 37 GB). My data cubes are only 200MB and 10MB. In htop, the processes are noted as
orted --hnp --set-sid --report-uri 9 --singleton-died-pipe 10 -mca state_novm_select_1
This appears to be a MPI-related process (https://github.com/open-mpi/ompi/issues/4577). If I kill one of these processes, all the others, as well as the python processes terminate.
Have you come across something similar? This appears to be a new issue after a recent upgrade of the system. Could there be an issue of the version of one of the packages?