rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
74 stars 19 forks source link

Termination error caused by outputBlockRange length zero #3

Closed stsmall closed 7 years ago

stsmall commented 7 years ago

Hi, I am trying to use stitch to impute missing genotypes, but keep receiving an error that causes termination (see below). The examples with mouse work correctly and the case of K=4 and nGens=100 work with my data, but trying K=10 and nGens=4000 errors. There may be a spot in between the 2 values that works as well but the runtimes are quite long for my data (6-7hours). Is there an option to restart the program from previous input files? thanks, scott

Loading required package: IRanges Loading required package: GenomicRanges Loading required package: Biostrings Loading required package: XVector [1] "Program start - Wed Apr 5 15:30:23 2017" [1] "Get and validate pos and gen - Wed Apr 5 15:30:23 2017" [1] "Done get and validate pos and gen - Wed Apr 5 15:30:25 2017" [1] "Get BAM sample names - Wed Apr 5 15:30:25 2017" [1] "Done getting BAM sample names - Wed Apr 5 15:30:26 2017" [1] "Generate inputs - Wed Apr 5 15:30:26 2017" [1] "Done geneating inputs - Wed Apr 5 19:01:09 2017" [1] "Copying files onto tempdir, Wed Apr 5 19:01:09 2017" [1] "Done copying files onto tempdir, Wed Apr 5 19:01:11 2017" [1] "Generate allele count, Wed Apr 5 19:01:11 2017" [1] "Quantiles across SNPs of per-sample depth of coverage" 5% 25% 50% 75% 95% 8.956331 12.797050 15.916212 19.276489 23.472954 [1] "Done generating allele count, Wed Apr 5 19:01:34 2017" [1] "Begin parameter initialization, Wed Apr 5 19:01:34 2017" [1] "Done parameter initialization, Wed Apr 5 19:01:35 2017" [1] "Begin EM - Wed Apr 5 19:01:35 2017" [1] "Number of SNPs - 65869" [1] "Start of iteration 1 - Wed Apr 5 19:01:35 2017" [1] "Error in if (outputBlockRange[length(outputBlockRange)] == outputBlockRange[length(outputBlockRange) - : \n argument is of length zero\n" attr(,"class") [1] "try-error" attr(,"condition") <simpleError in if (outputBlockRange[length(outputBlockRange)] == outputBlockRange[length(outputBlockRange) - 1]) outputBlockRange <- outputBlockRange[-length(outputBlockRange)]: argument is of length zero> Error in completeSampleIteration(N = N, tempdir = tempdir, chr = chr, : An error occured during STITCH. The first such error is above In addition: Warning message: In mclapply(x3, mc.cores = nCores, tempdir = tempdir, chr = chr, : scheduled cores 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 23, 24, 25 encountered errors in user code, all values of the jobs will be affected

stsmall commented 7 years ago

awesome, thanks!

Any thoughts on the block length error?

On 04/06/2017 03:06 PM, rwdavies wrote:

Hi Scott,

Yes, you can restart a run using previously generated input using the regenerateInput = FALSE argument, doing something like the following mouse example 6 (and related mouse example 5 above) https://github.com/rwdavies/STITCH/blob/master/examples/examples.R#L234

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rwdavies/STITCH/issues/3#issuecomment-292280325, or mute the thread https://github.com/notifications/unsubscribe-auth/ACtzP8pNSw1pCxFa9kEaOCqWk5730wDlks5rtTfFgaJpZM4M1pCD.

rwdavies commented 7 years ago

Hi Scott,

Yes, you can restart a run using previously generated input using the regenerateInput = FALSE argument, doing something like the following mouse example 6 (and related mouse example 5 above) https://github.com/rwdavies/STITCH/blob/master/examples/examples.R#L234

How many samples are there, and what was the command you used to run STITCH? For debugging purposes, do you see the same error if you use a much smaller number of SNPs, like 10 instead of 65K?

Also, what version was this? Is looks like it's loading Rsamtools libraries, so I assume this is pre-version v1.3, where I changed to SeqLib which should process input samples faster. I'm also preparing a new release which should be ready in the next few days which will process inputs reads as it streams them, which should further speed up loading time (I'll post to the forum when done)

Robbie

rwdavies commented 7 years ago

Also, just to confirm, with your data, it works for K=4 and nGen = 100, but does not work for K=10 and nGen = 4000?

stsmall commented 7 years ago

HI Robbie,

I am using 25 samples with the following command:

STITCH(tempdir = tempdir(), chr = "Wb_Chr1_0", bamlist = "pngbam_sample.list", posfile = "Wb_Chr1_0.pos", genfile = "Wb_Chr1_0.gen", outputdir = paste0(getwd(), "/"), K = 10, nGen = 4000, nCores = 40)

I am also using v1.29, I will update to v1.3 and try the same command.

Correct, it works with K=4 and nGen=100, and produced fairly high GP. It errors when I try K=10 and nGen=4000. Now that I understand how to use the previous input files, I will test a range of K's.

On 04/06/2017 03:43 PM, rwdavies wrote:

Also, just to confirm, with your data, it works for K=4 and nGen = 100, but does not work for K=10 and nGen = 4000?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rwdavies/STITCH/issues/3#issuecomment-292292508, or mute the thread https://github.com/notifications/unsubscribe-auth/ACtzP4louvEVTBtFIYjshj5m2GR2itU5ks5rtUB4gaJpZM4M1pCD.

rwdavies commented 7 years ago

Hmm, I actually reproduced this error by modifying the acceptance test below to use nCores = 40 and N = 25. I'll take a look https://github.com/rwdavies/STITCH/blob/master/STITCH/tests/testthat/test-acceptance.R#L54

rwdavies commented 7 years ago

OK, the problem seems to have exposed a way that samples were run across cores when the number of samples is not much greater than the number of cores. I've got a test that replicated the error and a fix to that error. I'll run it through my tests and push a new version, hopefully in the next hour or so

PS your samples are fairly high coverage, eh?

stsmall commented 7 years ago

The reason I am using STITCH is that the samples are from whole genome amplification. So some samples have high coverage in some areas but low coverage in other areas.

On 04/06/2017 04:17 PM, rwdavies wrote:

OK, the problem seems to have exposed a way that samples were run across cores when the number of samples is not much greater than the number of cores. I've got a test that replicated the error and a fix to that error. I'll run it through my tests and push a new version, hopefully in the next hour or so

PS your samples are fairly high coverage, eh?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rwdavies/STITCH/issues/3#issuecomment-292303512, or mute the thread https://github.com/notifications/unsubscribe-auth/ACtzP02Cc79ghXtR0uZak9km72x4cTq4ks5rtUhHgaJpZM4M1pCD.

rwdavies commented 7 years ago

OK, I think it should be fixed with release 1.3.3. Let me know if you encounter any problems with it