rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
77 stars 17 forks source link

downsample sample error #69

Closed Deeeeen closed 1 year ago

Deeeeen commented 2 years ago

Hi Robbie, I'm trying to run STITCH with 13k rat samples niterations=2 with a reference panel. I split each chromosome into chunks, and all of the chunks finished running STITCH, except one small chunk on chrX. I have re-tried running this chunk multiple times, but every time I get the same error message from the downsampling process. I'm wondering if you have any insight about this?

STITCH parameters:

Running STITCH(chr = chrX,
nGen = 100, 
posfile = chrX_pos, 
K = 8, 
S = 1, 
outputdir = stitch2_chrX_18458107/rsd, 
nStarts = , 
tempdir =stitch2_chrX_18458107/tmd, 
bamlist = bamlist_n13905, 
cramlist = , 
sampleNames_file = sampleName_n13905, 
reference = , 
genfile = , 
method = diploid, 
output_format = bgvcf, 
B_bit_prob = 16, 
outputInputInVCFFormat = FALSE, 
downsampleToCov = 50, 
downsampleFraction = 1, 
readAware = TRUE, 
chrStart = NA, 
chrEnd = NA, 
regionStart = 18458108, 
regionEnd = 24610798, 
buffer = 1e+06, 
maxDifferenceBetweenReads = 1000, 
maxEmissionMatrixDifference = 1e+10, 
alphaMatThreshold = 1e-04, 
emissionThreshold = 1e-04, 
iSizeUpperLimit = 600, 
bqFilter = 17, 
niterations = 2, 
shuffleHaplotypeIterations = NA, 
splitReadIterations = 25, 
nCores = 1, 
expRate = 0.5, 
maxRate = 100, 
minRate = 0.1, 
Jmax = 1000, 
regenerateInput = TRUE, 
originalRegionName = NA, 
keepInterimFiles = FALSE, 
keepTempDir = FALSE, 
outputHaplotypeProbabilities = FALSE, 
switchModelIteration = NA, 
generateInputOnly = FALSE, 
restartIterations = NA, 
refillIterations = NA, 
downsampleSamples = 1, 
downsampleSamplesKeepList = NA, 
subsetSNPsfile = NA, 
useSoftClippedBases = FALSE, 
outputBlockSize = 1000, 
outputSNPBlockSize = 10000, 
inputBundleBlockSize = NA, 
genetic_map_file = , 
reference_haplotype_file = chrX.hap.gz, 
reference_legend_file = chrX.legend.gz, 
reference_sample_file = , 
reference_populations = NA, 
reference_phred = 20, 
reference_iterations = 40, 
reference_shuffleHaplotypeIterations = c(4, 8, 12, 16), 
output_filename = NULL, 
initial_min_hapProb = 0.2, 
initial_max_hapProb = 0.8, 
regenerateInputWithDefaultValues = FALSE, 
plotHapSumDuringIterations = FALSE, 
plot_shuffle_haplotype_attempts = FALSE, 
plotAfterImputation = TRUE, 
save_sampleReadsInfo = FALSE, 
gridWindowSize = NA, 
shuffle_bin_nSNPs = NULL, 
shuffle_bin_radius = 5000, 
keepSampleReadsInRAM = FALSE, 
useTempdirWhileWriting = FALSE, 
output_haplotype_dosages = FALSE, 
use_bx_tag = TRUE, 
bxTagUpperLimit = 50000)

Error message

downsample sample 06 - 16 of 979 reads removed Error in sum(x) : invalid 'type' (list) of argument Calls: STITCH ... loadBamAndConvert -> merge_reads_from_sampleReadsRaw Execution halted

Deeeeen commented 2 years ago

The problem is solved when I expand my window(region) from 18458108-24610798 to 17458108-25610798. I wonder if it is because when STITCH is loading the bam files, it tries to load from the region start point, and it won't work if there is no reads at the starting point. (just a guess)

rwdavies commented 2 years ago

Hi,

Sorry for the slow reply, I was on holiday and just got back.

After a quick check of the code, I suspect what's happening is that a very small number of reads are being pre-processed, then are eliminated for some reason, and the code is hanging on that. I'll try to see if I can take a look at the code and further test it to remove this problem. Thanks for bringing this to my attention.

Thanks Robbie