rwdavies / STITCH

STITCH - Sequencing To Imputation Through Constructing Haplotypes
http://www.nature.com/ng/journal/v48/n8/abs/ng.3594.html
GNU General Public License v3.0
74 stars 19 forks source link

1: In readLines(bamlist) : incomplete final line found on '627bam.txt' #48

Closed linlin-pp closed 1 year ago

linlin-pp commented 3 years ago

sorry !Bother you again,I have a new problem and need your help.

caught segfault address 0x1ae08a004, cause 'memory not mapped'

Traceback: 1: cpp_read_reassign(ord = ord, qnameInteger_ord = qnameInteger_ord, bxtagInteger_ord = bxtagInteger_ord, bxtag_bad_ord = bxtag_bad_ord, qname = qname, bxtag = bxtag, strand = strand, sampleReadsRaw = sampleReadsRaw, readStart_ord = readStart_ord, readEnd_ord = readEnd_ord, readStart = readStart, readEnd = readEnd, iSizeUpperLimit = iSizeUpperLimit, bxTagUpperLimit = bxTagUpperLimit, use_bx_tag = use_bx_tag, save_sampleReadsInfo = save_sampleReadsInfo) 2: merge_reads_from_sampleReadsRaw(sampleReadsRaw = sampleReadsRaw, qname = qname, bxtag = bxtag, strand = strand, readStart = readStart, readEnd = readEnd, iSizeUpperLimit = iSizeUpperLimit, use_bx_tag = use_bx_tag, bxTagUpperLimit = bxTagUpperLimit, save_sampleReadsInfo = save_sampleReadsInfo, qname_all = qname_all, readStart_all = readStart_all, readEnd_all = readEnd_all) 3: loadBamAndConvert(iBam = iBam, L = L, pos = pos, nSNPs = nSNPs, bam_files = bam_files, cram_files = cram_files, reference = reference, iSizeUpperLimit = iSizeUpperLimit, bqFilter = bqFilter, chr = chr, N = N, downsampleToCov = downsampleToCov, sampleNames = sampleNames, inputdir = inputdir, useSoftClippedBases = useSoftClippedBases, regionName = regionName, tempdir = tempdir, chrStart = chrStart, chrEnd = chrEnd, chrLength = chrLength, save_sampleReadsInfo = save_sampleReadsInfo, use_bx_tag = use_bx_tag, bxTagUpperLimit = bxTagUpperLimit) 4: FUN(X[[i]], ...)

[2021-03-10 17:56:29] downsample sample A02299 - 136 of 1647166 reads removed [2021-03-10 17:56:33] Done generating inputs [2021-03-10 17:56:33] Copying files onto tempdir [2021-03-10 18:37:33] Done copying files onto tempdir [2021-03-10 18:37:33] Generate allele count [2021-03-10 18:51:54] Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection

Error in check_mclapply_OK(out2) : An error occured during STITCH. The first such error is above Calls: STITCH -> buildAlleleCount -> check_mclapply_OK In addition: Warning messages: 1: In readLines(bamlist) : incomplete final line found on '627bam.txt' 2: In mclapply(1:length(sampleRanges), mc.cores = nCores, FUN = loadBamAndConvert_across_a_range, : scheduled cores 1, 14 did not deliver results, all values of the jobs will be affected 3: In mclapply(sampleRanges, mc.cores = nCores, FUN = buildAlleleCount_subfunction, : scheduled cores 1, 14 encountered errors in user code, all values of the jobs will be affected Execution halted srun: error: srv6: task 0: Exited with exit code 1

                                                                                                                                                                        Thank you!
rwdavies commented 3 years ago

That sort of segfault probably means you're out of memory. Is this on a cluster, can you go back and see how much memory you used? I'm surprised it ran out of memory at that spot though. How many samples and SNPs, what kind of coverage, and how much RAM on the machine? The fact that some samples have so many reads (1647166) suggests you're imputing a very large region, and could get away with less (or potentially if some samples have very high coverage, you can downsample them in advance, using e.g. samtools)

rwdavies commented 3 years ago

Interesting though, mclapply from parallel in R seems to have changed behaviour, which is annoying. STITCH should have errored out earlier, that's something I'll fix

https://stackoverflow.com/questions/62811745/scheduled-cores-did-not-deliver-results-all-values-of-the-jobs-will-be-affe

rwdavies commented 3 years ago

Sorry, wait, nevermind, the behaviour doesn't seem to have changed. I think the child process died while running, and that's just what mclapply prints when that happens.

Anyway I think it's probably RAM. Related to the RAM question, it seems like you're using a lot of cores using fewer of them should mean you use less RAM

linlin-pp commented 3 years ago

I have 629 rabbit samples mean coverage= 3.89X min coverage=1.4X max coverage=8.87X Chr1 SNPs=3447812(max) I use basevar software provided pos matrix

rwdavies commented 3 years ago

That is a lot of SNPs to do at once. What about trying to impute just a 5 Mbp region with say a 500kbp buffer? RAM use is linearly proportional to the number of SNPS being analyzed

linlin-pp commented 3 years ago

Thank you for your help, my problem has been solved,stitch is a very good software!