Open milot-mirdita opened 5 years ago
@elileka is this fixed now with your changes?
Is there any update?
The first two issues should be handled as of commit cbb542af98095210bad8399cda02b67487d0bdde.
The third issue is a bit trickier. here's why:
The sliced search workflow (searchslicedtargetprofile.sh) is where the available disk (regular tmp folder) is taken into account to determine the number of profiles to process (the information is passed to it from search). --local-temp
is a parameter, which is relevant only in MPI mode. Assuming all MPI nodes have the same available disk in their --local-tmp
(does this even hold?), then the way to take it into account is to set the disk limit in the sliced search workflow as the minimum between the regular tmp folder (on the master node) and the available disk space on the master's --local-temp
times the number of MPI nodes. However, the number of MPI nodes is determined through quite a complicated logic in the Prefilter constructor, which is called from within the sliced search workflow after it calculates the disk space limit. An exit with error could be added from within Prefilter (asking to re-run the program with --disk-space-limit
equal to local-size x Nnodes) but it is not very elegant as the run already started by then.
Other possible issues with
--local-tmp
:--local-tmp
run on the same nodes (also unlikely)