gphocs-dev / G-PhoCS

G-PhoCS is a software package for inferring ancestral population sizes, population divergence times, and migration rates from individual genome sequences.
33 stars 4 forks source link

Run taking longer time than expected on HPC. #89

Open rayistr opened 1 month ago

rayistr commented 1 month ago

Hi, I am trying to run Gphocs for 5 populations using the multithreaded version. I tried submitting it as a job using PBS on HPC using 80 threads (over 2 cores).

But the run was taking much longer than expected (only 28K iterations in 6 days!) and the log file indicated usage of only 40 threads. Please let me know what is the issue here. I am attaching my PBS scripts, Logfile, and Control file.

The PBS script for job submission was,


#PBS -N jobname
#PBS -q quename
#PBS -l select=2:ncpus=40,pmem=188
#PBS -V
#PBS -l walltime=128:00:00
#PBS -o file.o
#PBS -e file.err
#PBS -m abe
#PBS -M email_id

cd $PBS_O_WORKDIR

/software_directory/G-PhoCS/bin/G-PhoCS controlfile.ctl -n 80

The log file I got was,


G-Phocs version 1.3.2, Oct. 2017


Setting Thread Count to: 40 Reading control settings from file controlfile.ctl... Done. Reading sequence data... 118398 loci, as specified in sequence file gphocsfile.gphocs. Reading loci (.=100 loci): .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ...

Starting MCMC: 0 burnin, 1000000 running, sampled every 0 iteration(s). There are 24 parameters in the model.

For the control file I used the default parameters of Gphocs given below,

GENERAL-INFO-START

  seq-file        gphocsfilel.gphocs
  trace-file      tracefile.log
  burn-in     0
  mcmc-iterations     1000000
  mcmc-sample-skip        0
  start-mig       0
  iterations-per-log      100
  logs-per-line       100

  tau-theta-print     10000
  tau-theta-alpha     1
  tau-theta-beta      10000

  mig-rate-print      0.001
  mig-rate-alpha      0.002
  mig-rate-beta       0.00001

  locus-mut-rate      CONST

  find-finetunes      TRUE
  find-finetunes-num-steps        100
  find-finetunes-samples-per-step     100

GENERAL-INFO-END

And had 5 current populations, 4 ancestral populations and 10 migration bands between the current populations. PS: I had run a similar gphocs outside hpc and it was faster (200K iterations in 6 days)

igronau commented 1 month ago

I don't see anything obviously problematic with your configuration. It might be a problem in the specific configuration of the HPC. It seems that it is running G-PhoCS with only 40 threads and not 80, as you would have wanted. Did you try running it with -n 80?