hackingmaterials / atomate

atomate is a powerful software for computational materials science and contains pre-built workflows.
https://hackingmaterials.github.io/atomate
Other
245 stars 175 forks source link

Can't run mutiple queues on one node with error : mpirun noticed .... exited on signal 15 (Terminated) #748

Open aijcode opened 2 years ago

aijcode commented 2 years ago

Please submit help issues to: https://matsci.org/atomate

The vasp jobs run well with my SGE queue system. The single job also run well with atomate, but it will run into error with mutiple queues jobs on one node. The jobs can be submitted successfully, but would encounter a mpirun error. the vasp.out file shows that : "mpirun noticed that process rank 3 with PID 57743 on node node3 exited on signal 15 (Terminated)" this error never show in SGE that directly runs with "mpirun -np n vasp".

I think it would be a bug in atomate or custodian. I just figure out that the vasp pid is submitted by the func "self.pid = _posixsubprocess.fork_exec()" in custodian.