Open mangelett opened 3 years ago
Working with Slurm can be tricky sometimes. One key issue I've seen in the past is nodes' to filesystems. For parallel to work, all nodes need to have I/O access to the data and tempfiles. This issue seems to be a bug. Thanks for reporting.
Normally, the nodes have IO access to the data and tempfile : data are on a file system shared among the nodes and I set the TMPDIR variable to a folder on this shared file system (originally to not saturate the disk space of node)
Sorry for the late reply. Can you verify that Stata recognizes the TMPDIR variable as the shared path you specified when submitting the jobs?
The command tempfile junk; display "`junk'" prints a tempfile which is in the shared folder that I specified in the TMPDIR variable. So it seems Stata recognizes the shared path. Besides, the logfile pllul97ezlin1do0001.log and pllul97ezlin1do0002.log are in this folder.
Preliminaries
Before submitting an issue, please check (with
x
in brackets) that you:Expected behavior and actual behavior
I'm trying to run the parallel command on two nodes of a HPC cluster using the hostnames option in parallel initialize. When I specify the hostnames, I obtained the error "child process 0002 Exited with error -700- while running the command/dofile (view log)...". The logfile __pll[pll_id]_do0002.log is empty.
The command works fine without the hostnames option (working only on one node).
Steps to reproduce the problem
The following code is saved in the file test_parallel.do:
The code is launched with the command
stata test_parallel.do
inside a SLURM batch file (which request the node cn07").System information
Output from
creturn list
: