Open caicairay opened 4 years ago
Possible configuration reason:
It should be mu01
in /var/lib/torque/server_name
, but localhost
was found.
Need to reconfigure after nodes back online.
The pbs daemon config is in
~]# cat /var/lib/torque/mom_priv/config
# Configuration for pbs_mom.
$pbsserver mu01
The computational node will lose connection after some specific job (request 20 nodes) was assigned. The assigned job was rejected and re-queue many times and BatchHold in the end.
similar issue happend on job 72823.mu01. Is it related to #8 ? Thanks!