rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
73 stars 19 forks source link

running program error #54

Open yao-chenxin opened 2 years ago

yao-chenxin commented 2 years ago

Hi, The program eventually found this error message. [2021-08-10 08:12:10] Start of iteration 24 [2021-08-10 08:13:28] iteration=24, sample(r2)=0.928, 0.937, 0.925, 0.91, 0.941, 0.925, 0.918, 0.936, 0.935, 0.939 [2021-08-10 08:13:29] Start of iteration 25 [2021-08-10 09:57:12] iteration=25, sample(r2)=0.928, 0.937, 0.925, 0.91, 0.941, 0.926, 0.918, 0.936, 0.935, 0.94, Error in priorSum_m[, s] : incorrect number of dimensions Calls: STITCH -> completeSampleIteration -> calculate_updates In addition: Warning message: In mclapply(sampleRanges, mc.cores = nCores, FUN = subset_of_complete_iteration, : scheduled core 2 did not deliver a result, all values of the job will be affected Execution halted

My program code is as follows: quick_qsub =10= {-q cu -l nodes=1:ppn=36} "STITCH.R --chr=chr10 --bamlist=./bamlist.10.txt --posfile=./pos.10.txt --genfile=./gen.10.txt --sampleNames_file=./sample.id10.txt --outputdir=./ --K=4 --nGen=100 --nCores=36 --tempdir=./tmp10 &>run.chr10.log"

  1. I would like to ask how to solve this error problem?
  2. How to select three parameters -K , -nGen, -nCores ?I think of nGores as the number of threads that the program runs on I am looking forward to the author's reply. Thank you
rwdavies commented 2 years ago

Hi,

First, apologies, I missed several Issues opened in mid-summer while I was on vacation and didn't see the emails when I came back.

If this is still an issue, because this happened on iteration 25, I suspect this is a problem with splitReadIterations. This in practice isn't as helpful as I thought it would be when I first conceived of it, so I would recommend turning it off using splitReadIterations=NA and hopefully this will solve the problem.

nCores = the number of threads, depends on the architecture of your system nGen = the number of generations between your samples and when you could represent your population using "ancestral haplotypes" at the given K. In a fully outbred population, under coalescent theory, this would be nGen = 4 * Ne / K. Otherwise if you know your population was bottlenecked at some point in the past, I would choose that point in the past (in terms of number of generations). K = this sets the number of ancestral haplotypes. It is context dependent and the most important parameter you set. Generally I recommend if you have context to guide you, you do that (e.g. if you know your population was bottlenecked down to some number of haplotypes in the recent past, and this is decently small, use that number), otherwise if the total coverage across all samples is not very large, I would set K as something like K = total coverage / 10, so e.g. if you have 100X total coverage I would use K of around 10. Finally if you have lots of coverage and no idea about demographics or an outbred population, I would use the largest K you reasonably can, based on run time

Hope that helps and sorry for the late reply, Robbie