rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
73 stars 19 forks source link

Error in check_mclapply_OK(single_iteration_results) #56

Closed Deeeeen closed 2 years ago

Deeeeen commented 2 years ago

Hi Robbie @rwdavies , Could you please offer some help with the error that I am getting when running STITCH?

Running:

STITCH(chr = Scaffold_1__1_contigs__length_169319896,  
posfile = XXX/Scaffold_1__1_contigs__length_169319896_POS, 
outputdir = XXX/Scaffold_1__1_contigs__length_169319896_OUT, 
tempdir = XXX/temp_Scaffold_1__1_contigs__length_169319896,
bamlist = XXX/Scaffold_1__1_contigs__length_169319896_bamlist, 
sampleNames_file = XXX/Scaffold_1__1_contigs__length_169319896_sample_names, 
K = 120, nGen = 80, method = diploid,  niterations = 40, nCores = 24)

Error message:

[2021-08-17 15:27:46] Done generating inputs [2021-08-17 15:27:46] Copying files onto tempdir [2021-08-17 15:28:02] Done copying files onto tempdir [2021-08-17 15:28:02] Generate allele count [2021-08-17 15:28:17] Quantiles across SNPs of per-sample depth of coverage [2021-08-17 15:28:17] 5% 25% 50% 75% 95% [2021-08-17 15:28:17] 0.216 0.398 0.613 0.954 1.897 [2021-08-17 15:28:17] Done generating allele count [2021-08-17 15:28:18] Outputting will be done in 137 blocks with on average 9994.2 SNPs in them [2021-08-17 15:28:18] Begin parameter initialization [2021-08-17 15:28:25] Done parameter initialization [2021-08-17 15:28:25] Start EM [2021-08-17 15:28:25] Number of samples: 88 [2021-08-17 15:28:25] Number of SNPs: 1369199 [2021-08-17 15:28:25] Start of iteration 1 [2021-08-17 15:28:29] Error : cannot allocate vector of size 146.9 Gb

Error in check_mclapply_OK(single_iteration_results) : An error occured during STITCH. The first such error is above Calls: STITCH -> completeSampleIteration -> check_mclapply_OK In addition: Warning message: In mclapply(sampleRanges, mc.cores = nCores, FUN = subset_of_complete_iteration, : all scheduled cores encountered errors in user code Execution halted

Is it because I'm not requesting enough memory from the cluster for this process? Do you have any suggestions on how to resolve this issue? If there is no good way to resolve this, should I split the chromosome into small pieces, run STITCH on then with specifying a butter size, then concatenate the VCF files together at the end?

Many thanks

rwdavies commented 2 years ago

Hi,

First, apologies, I missed several Issues opened in mid-summer while I was on vacation and didn't see the emails when I came back.

If this issue is still a problem, I think the K is much too large here. You're setting K to 120, meaning STITCH is trying to run with 120 ancestral haplotypes, but you only have 88 samples (176 haplotypes) to impute, using ~0.6X. I would suggest setting K to something like K=8 or even K=4. With only 88 samples at 0.6X, you have ~50X coverage total, so it's going to be hard to infer ancestral haplotypes very well for more than K=4-6. That will bring RAM down considerably as ram is O(K^2) so K=120 uses about 3 orders of magnitude more RAM than K=4

Other strategies like splitting the chromosome into chunks aren't obviously necessary here.

Hope that helps, Robbie