Closed caitlinadams closed 6 years ago
I haven't run emcee yet. I have a feeling that this is because the mpi4py
package is from conda while srun
is using a different openmpi
version.
What happens if you completely disable conda, (i.e., remove the /path/to/conda
in your $PATH
variable) and then run module load python
, pip install emcee --user
etc.
If that does not work, you may want to ask the Swinburne hpc-support
The interactive version got further than previously but still seemed to get stuck. I submitted it as a batch script again and saw:
[cadams@farnarkle2 submissions]$ squeue -u cadams
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
220425 skylake emcee_mo cadams PD 0:00 1 (None)
[cadams@farnarkle2 submissions]$ squeue -u cadams
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
220425 skylake emcee_mo cadams CG 0:01 1 john16
[cadams@farnarkle2 submissions]$ squeue -u cadams
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
So it went from PENDING to CLOSING. I've already asked hpc-support if they'll install emcee
as a module, so will write to them, in addition, to say that I'm having trouble launching jobs with it -- either using anaconda
or using pip install emcee --user
.
Thanks for the help! I will reply here if I'm able to get it working for anyone who might want to work with emcee in the future.
I've resolved it with hpc-support. It was to do with how I specified my output and error files in the sbatch script. Turns out slurm was confused by the ozstar.swin.edu.au:
prefix that I had. This used to work fine in my g2 scripts, so I had also applied the practice here.
As for emcee, it all appears to be working. I'm currently using the suggestion of pip install emcee --user
-- so thanks for that, @manodeep!
For anyone who might need this in the future, it should also be noted that srun
is not required for emcee. srun
in this case was creating 16 copies of the job, each of which was trying to spawn 16 threads.
I've just transitioned over to OzSTAR, but can't get my
emcee
jobs to run. They appear to be submitted but then disappear. No error or output files are being generated. The program was running fine on g2, so I think it must be something either with installed packages or my sbatch script. @manodeep have you runemcee
on OzSTAR yet?The steps I took to install
emcee
are:The jobscript looks like this:
Within
emceerun.py
, I have requested 16 threads:I've also tried submitting the job on an interactive node using
salloc --account=oz073 --nodes=1 --ntasks-per-node=16 --time=4:00:00 --mem-per-cpu=4G
The print statements fromemceerun.py
did appear for each thread, but then the program never got any further.Any help would be greatly appreciated! I'm at a complete loss for what I'm doing wrong.