hartleys / QoRTs

Quality of RNA-Seq Toolset
52 stars 14 forks source link

Excessive RAM usage #89

Open royfrancis opened 1 year ago

royfrancis commented 1 year ago

I have a 25 GB BAM file with about 400 million PE reads coming from the zUMIs pipeline. Single-cell SMART-Seq3 RNA-Seq reads with UMIs. I am running QoRTs QC on this and I am running into out of memory. I tried providing 128GB RAM and then raised it to 256GB and I still get the same error. Is it reasonable that more than 256GB RAM might be needed for a BAM file of this size?

This is my script.

java -Xmx200G -jar /sw/bioinfo/QoRTs/1.3.6/rackham/lib/QoRTs.jar QC \
--genomeFA genome.fa \
--flatgff genes-flat.gff \
--RNA \
--noGzipOutput \
--verbose \
--maxReadLength 125 \
sample.filtered.Aligned.GeneTagged.UBcorrected.sorted.bam \
"final_annot.gtf" \
"sample-qorts"

In the output folder I get these two files: QC.QORTS_RUNNING QC.yX9gr2Yu8Jsk.log

I randomly downsampled this BAM to a 15GB BAM to test and I still get the same error. I am starting to suspect it's not just the number of reads.

Complete run output ``` Starting QoRTs v1.3.6 (Compiled Tue Sep 25 11:21:46 EDT 2018) Starting time: (Thu Feb 09 19:57:13 CET 2023) INPUT_COMMAND(QC) INPUT_ARG(infile)=sample.bam INPUT_ARG(gtffile)=/crex/proj/project/nobackup/nbis/data/processed/zumis/03dpf/03dpf.final_annot.gtf INPUT_ARG(outdir)=sample-qorts INPUT_ARG(genomeFA)=Some(List(/crex/proj/project/nobackup/nbis/data/reference/grcz10-custom/genome.fa)) INPUT_ARG(flatgfffile)=Some(/crex/proj/project/nobackup/nbis/data/processed/zumis/qorts/genes-flat.gff) INPUT_ARG(isRNASeq)=true INPUT_ARG(noGzipOutput)=true INPUT_ARG(verbose)=true INPUT_ARG(maxReadLength)=Some(125) Created Log File: sample-qorts/QC.ZfEVCwtLEYqQ.log Starting QC [Time: 2023-02-09 19:57:13] [Mem usage: [75MB / 2058MB]] [Elapsed Time: 00:00:00.0000] QoRTs is Running in paired-end mode. QoRTs is Running in any-sorted mode. Parameter --genomeFA found. Adding reference mismatch testing. NOTE: Function "overlapMatch" requires function "mismatchEngine". Adding "mismatchEngine" to the active function list... Running functions: CigarOpDistribution, GCDistribution, GeneCalcs, InsertSize, JunctionCalcs, NVC, QualityScoreDistribution, StrandCheck, chromCounts, cigarLocusCounts, mismatchEngine, overlapMatch, readLengthDistro, referenceMatch, writeBiotypeCounts, writeClippedNVC, writeDESeq, writeDEXSeq, writeGeneBody, writeGeneCounts, writeGenewiseGeneBody, writeJunctionSeqCounts, writeKnownSplices, writeNovelSplices, writeSpliceExon Checking first 10000 reads. Checking SAM file for formatting errors... Stats on the first 10000 reads: Num Reads Primary Map: 10000 Num Reads Paired-ended: 10000 Num Reads mapped pair: 9989 Num Pair names found: 5389 Num Pairs matched: 4600 Read Seq length: 63 to 118 Unclipped Read length: 63 to 118 Final maxReadLength: 125 maxPhredScore: 37 minPhredScore: 2 NOTE: Read length is not consistent. In the first 10000 reads, read length varies from 63 to 118 (param maxReadLength=125) Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information. Note: Data appears to be paired-ended. Sorting Note: Reads are not sorted by name (This is OK). Sorting Note: Reads are sorted by position (This is OK). Done checking first 10000 reads. No major problems detected! Starting getSRPairIterResorted... SAMRecord Reader Generated. Read length: 125. [Time: 2023-02-09 19:57:18] [Mem usage: [720MB / 2595MB]] [Elapsed Time: 00:00:04.0783] > Init GeneCalcs Utility > Init InsertSize Utility > Init NVC utility > Init CigarOpDistribution Utility > Init QualityScoreDistribution Utility > Init GC counts Utility > Init JunctionCalcs utility length of knownSpliceMap after instantiation: 256778 length of knownCountMap after instantiation: 256778 > Init StrandCheck Utility > Init chromCount Utility > Init qcCigarLocusCounts Utility > Init OverlapMatch Utility > Init MinorUtils Utility QC Utilities Generated! [Time: 2023-02-09 19:58:42] [Mem usage: [13GB / 15GB]] [Elapsed Time: 00:01:28.0789] helper_calculateGeneAssignmentMap_strict. Found: 31956 genes in the supplied annotation. helper_calculateGeneAssignmentMap_strict. Found: 4912 genes with ambiguous segments. helper_calculateGeneAssignmentMap_strict. Found: 27044 genes after first-pass filtering making makeGeneIntervalMap for geneBody calculations. Found: 27044 acceptable genes for gene-body analysis. NOTE: Unsorted Read-PAIR-Buffer Size > 100000 [Mem usage:[8GB / 34GB]] Currently searching for read: A01901:60:H37HJDRX2:2:2125:2311:15984 for 83585 iterations. Searching for read: A01901:60:H37HJDRX2:2:2125:2311:15984 10:1211823-1211904 99 Current unmatched-pair-buffer status: 33780 (This is generally not a problem, but if this increases further then OutOfMemoryExceptions may occur. If memory errors do occur, either increase memory allocation or sort the bam-file by name and rerun with the '--nameSorted' option. This might also indicate that your dataset contains an unusually large number of chimeric read-pairs. Or it could occur simply due to the presence of genomic loci with extremly high coverage or complex splicing. It may also indicate a SAM/BAM file that does not adhere to the standard SAM specification.) ..........[1000000 Read-Pairs processed] [Time: 2023-02-09 20:02:42] [GenomeSeqContainer Status: buf:(10:13612000-13887000) n=275, MaxSoFar=895] ..........[2000000 Read-Pairs processed] [Time: 2023-02-09 20:05:44] [GenomeSeqContainer Status: buf:(10:28348000-28581000) n=233, MaxSoFar=895] NOTE: Unsorted Read-PAIR-Buffer Size > 200000 [Mem usage:[46GB / 57GB]] Currently searching for read: A01901:60:H37HJDRX2:1:1126:7039:25175 for 196883 iterations. Searching for read: A01901:60:H37HJDRX2:1:1126:7039:25175 10:29443811-29443858 99 Current unmatched-pair-buffer status: 9621 ..........[3000000 Read-Pairs processed] [Time: 2023-02-09 20:08:54] [GenomeSeqContainer Status: buf:(10:39134000-39393000) n=259, MaxSoFar=926] ........Switching to Chromosome: 11 [2023-02-09 20:11:41] ... Skipping chrom "10" in genome fasta... found chrom 11 [2023-02-09 20:11:41] ..[4000000 Read-Pairs processed] [Time: 2023-02-09 20:12:05] [GenomeSeqContainer Status: buf:(11:231000-964000) n=733, MaxSoFar=926] ..........[5000000 Read-Pairs processed] [Time: 2023-02-09 20:15:18] [GenomeSeqContainer Status: buf:(11:10564000-11164000) n=600, MaxSoFar=926] ..........[6000000 Read-Pairs processed] [Time: 2023-02-09 20:18:14] [GenomeSeqContainer Status: buf:(11:24745000-25055000) n=310, MaxSoFar=926] ..........[7000000 Read-Pairs processed] [Time: 2023-02-09 20:21:16] [GenomeSeqContainer Status: buf:(11:38804000-39321000) n=517, MaxSoFar=926] ... NOTE: Unmatched Read Buffer Size > 100000 [Mem usage:[92GB / 94GB]] (This is generally not a problem, but if this increases further then OutOfMemoryExceptions may occur. If memory errors do occur, either increase memory allocation or sort the bam-file by name and rerun with the '--nameSorted' option. This might also indicate that your dataset contains an unusually large number of chimeric read-pairs. Or it could occur simply due to the presence of genomic loci with extremly high coverage. It may also indicate a SAM/BAM file that does not adhere to the standard SAM specification.) .. NOTE: Unmatched Read Buffer Size > 200000 [Mem usage:[42GB / 94GB]] NOTE: Unsorted Read-PAIR-Buffer Size > 400000 [Mem usage:[44GB / 94GB]] Currently searching for read: A01901:60:H37HJDRX2:1:1117:7473:19633 for 345767 iterations. Searching for read: A01901:60:H37HJDRX2:1:1117:7473:19633 11:44043229-44043346 163 Current unmatched-pair-buffer status: 47797 .....[8000000 Read-Pairs processed] [Time: 2023-02-09 20:24:33] [GenomeSeqContainer Status: buf:(11:44855000-45093000) n=238, MaxSoFar=926] Switching to Chromosome: 12 [2023-02-09 20:24:47] ... Skipping chrom "11" in genome fasta... found chrom 12 [2023-02-09 20:24:47] ..........[9000000 Read-Pairs processed] [Time: 2023-02-09 20:27:31] [GenomeSeqContainer Status: buf:(12:11409000-11757000) n=348, MaxSoFar=1015] ..........[10000000 Read-Pairs processed] [Time: 2023-02-09 20:30:34] [GenomeSeqContainer Status: buf:(12:26538000-26792000) n=254, MaxSoFar=1015] ..........[11000000 Read-Pairs processed] [Time: 2023-02-09 20:33:29] [GenomeSeqContainer Status: buf:(12:39494000-39724000) n=230, MaxSoFar=1015] .....Switching to Chromosome: 13 [2023-02-09 20:35:04] ... Skipping chrom "12" in genome fasta... found chrom 13 [2023-02-09 20:35:04] .....[12000000 Read-Pairs processed] [Time: 2023-02-09 20:36:21] [GenomeSeqContainer Status: buf:(13:4653000-5074000) n=421, MaxSoFar=1015] ..........[13000000 Read-Pairs processed] [Time: 2023-02-09 20:39:16] [GenomeSeqContainer Status: buf:(13:22597000-23054000) n=457, MaxSoFar=1015] ..........[14000000 Read-Pairs processed] [Time: 2023-02-09 20:42:13] [GenomeSeqContainer Status: buf:(13:36358000-36611000) n=253, MaxSoFar=1015] ........Switching to Chromosome: 14 [2023-02-09 20:44:38] ... Skipping chrom "13" in genome fasta... found chrom 14 [2023-02-09 20:44:38] ..[15000000 Read-Pairs processed] [Time: 2023-02-09 20:45:09] ..........[16000000 Read-Pairs processed] [Time: 2023-02-09 20:48:20] [GenomeSeqContainer Status: buf:(14:6824000-7299000) n=475, MaxSoFar=1041] ..........[17000000 Read-Pairs processed] [Time: 2023-02-09 20:51:15] [GenomeSeqContainer Status: buf:(14:25170000-25389000) n=219, MaxSoFar=1041] ..........[18000000 Read-Pairs processed] [Time: 2023-02-09 20:54:28] [GenomeSeqContainer Status: buf:(14:32531000-33088000) n=557, MaxSoFar=1041] ..........[19000000 Read-Pairs processed] [Time: 2023-02-09 20:57:29] [GenomeSeqContainer Status: buf:(14:37242000-37498000) n=256, MaxSoFar=1041] .... NOTE: Unmatched Read Buffer Size > 400000 [Mem usage:[81GB / 102GB]] NOTE: Unsorted Read-PAIR-Buffer Size > 800000 [Mem usage:[85GB / 102GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2204:10384:7952 for 691700 iterations. Searching for read: A01901:60:H37HJDRX2:1:2204:10384:7952 14:46035424-46647919 163 Current unmatched-pair-buffer status: 578759 NOTE: Unsorted Read-PAIR-Buffer Size > 1600000 [Mem usage:[91GB / 102GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2109:23194:2895 for 80666 iterations. Searching for read: A01901:60:H37HJDRX2:1:2109:23194:2895 14:46090552-46651262 163 Current unmatched-pair-buffer status: 430200 NOTE: Unsorted Read-PAIR-Buffer Size > 3200000 [Mem usage:[22GB / 117GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2109:23194:2895 for 1680666 iterations. Searching for read: A01901:60:H37HJDRX2:1:2109:23194:2895 14:46090552-46651262 163 Current unmatched-pair-buffer status: 599192 ......[20000000 Read-Pairs processed] [Time: 2023-02-09 21:01:33] [GenomeSeqContainer Status: buf:(14:46637000-47216000) n=579, MaxSoFar=1143] ..........[21000000 Read-Pairs processed] [Time: 2023-02-09 21:04:33] [GenomeSeqContainer Status: buf:(14:46638000-47216000) n=578, MaxSoFar=1143] ..........[22000000 Read-Pairs processed] [Time: 2023-02-09 21:07:37] [GenomeSeqContainer Status: buf:(14:46640000-47216000) n=576, MaxSoFar=1143] ..........[23000000 Read-Pairs processed] [Time: 2023-02-09 21:10:32] [GenomeSeqContainer Status: buf:(14:46641000-47731000) n=1090, MaxSoFar=1143] .........Switching to Chromosome: 15 [2023-02-09 21:13:41] ... Skipping chrom "14" in genome fasta... found chrom 15 [2023-02-09 21:13:41] .[24000000 Read-Pairs processed] [Time: 2023-02-09 21:13:42] [GenomeSeqContainer Status: buf:(15:6000-558000) n=552, MaxSoFar=1143] ..........[25000000 Read-Pairs processed] [Time: 2023-02-09 21:16:37] [GenomeSeqContainer Status: buf:(15:20466000-20553000) n=87, MaxSoFar=1143] ..........[26000000 Read-Pairs processed] [Time: 2023-02-09 21:19:36] [GenomeSeqContainer Status: buf:(15:32485000-32761000) n=276, MaxSoFar=1143] ..........[27000000 Read-Pairs processed] [Time: 2023-02-09 21:22:39] [GenomeSeqContainer Status: buf:(15:46359000-47022000) n=663, MaxSoFar=1280] .Switching to Chromosome: 16 [2023-02-09 21:23:10] ... Skipping chrom "15" in genome fasta... found chrom 16 [2023-02-09 21:23:10] .........[28000000 Read-Pairs processed] [Time: 2023-02-09 21:25:52] [GenomeSeqContainer Status: buf:(16:6859000-7471000) n=612, MaxSoFar=1280] ..........[29000000 Read-Pairs processed] [Time: 2023-02-09 21:28:53] [GenomeSeqContainer Status: buf:(16:16248000-16569000) n=321, MaxSoFar=1280] ..........[30000000 Read-Pairs processed] [Time: 2023-02-09 21:32:43] [GenomeSeqContainer Status: buf:(16:24269000-24630000) n=361, MaxSoFar=1280] ..........[31000000 Read-Pairs processed] [Time: 2023-02-09 21:36:00] [GenomeSeqContainer Status: buf:(16:31957000-32208000) n=251, MaxSoFar=1280] ..........[32000000 Read-Pairs processed] [Time: 2023-02-09 21:39:05] [GenomeSeqContainer Status: buf:(16:43314000-43783000) n=469, MaxSoFar=1280] ..........[33000000 Read-Pairs processed] [Time: 2023-02-09 21:42:26] [GenomeSeqContainer Status: buf:(16:52018000-52525000) n=507, MaxSoFar=1280] ..Switching to Chromosome: 17 [2023-02-09 21:43:21] ... Skipping chrom "16" in genome fasta... found chrom 17 [2023-02-09 21:43:21] ........[34000000 Read-Pairs processed] [Time: 2023-02-09 21:45:50] [GenomeSeqContainer Status: buf:(17:850000-1328000) n=478, MaxSoFar=1280] ..........[35000000 Read-Pairs processed] [Time: 2023-02-09 21:49:14] [GenomeSeqContainer Status: buf:(17:18859000-19099000) n=240, MaxSoFar=1280] ..........[36000000 Read-Pairs processed] [Time: 2023-02-09 21:53:36] [GenomeSeqContainer Status: buf:(17:32562000-32832000) n=270, MaxSoFar=1280] .........Switching to Chromosome: 18 [2023-02-09 21:57:41] ... Skipping chrom "17" in genome fasta... found chrom 18 [2023-02-09 21:57:41] .[37000000 Read-Pairs processed] [Time: 2023-02-09 21:57:57] [GenomeSeqContainer Status: buf:(18:1008000-1307000) n=299, MaxSoFar=1280] ..........[38000000 Read-Pairs processed] [Time: 2023-02-09 22:02:43] [GenomeSeqContainer Status: buf:(18:10841000-11373000) n=532, MaxSoFar=1280] ..........[39000000 Read-Pairs processed] [Time: 2023-02-09 22:05:55] [GenomeSeqContainer Status: buf:(18:24998000-25398000) n=400, MaxSoFar=1280] ..........[40000000 Read-Pairs processed] [Time: 2023-02-09 22:09:04] [GenomeSeqContainer Status: buf:(18:44829000-45135000) n=306, MaxSoFar=1280] .....Switching to Chromosome: 19 [2023-02-09 22:10:44] ... Skipping chrom "18" in genome fasta... found chrom 19 [2023-02-09 22:10:44] .....[41000000 Read-Pairs processed] [Time: 2023-02-09 22:13:07] [GenomeSeqContainer Status: buf:(19:1913000-2661000) n=748, MaxSoFar=1280] ..........[42000000 Read-Pairs processed] [Time: 2023-02-09 22:17:17] [GenomeSeqContainer Status: buf:(19:11309000-11566000) n=257, MaxSoFar=1280] ..........[43000000 Read-Pairs processed] [Time: 2023-02-09 22:21:43] [GenomeSeqContainer Status: buf:(19:21194000-21517000) n=323, MaxSoFar=1280] ..........[44000000 Read-Pairs processed] [Time: 2023-02-09 22:25:46] [GenomeSeqContainer Status: buf:(19:22228000-22643000) n=415, MaxSoFar=1280] ..........[45000000 Read-Pairs processed] [Time: 2023-02-09 22:29:08] [GenomeSeqContainer Status: buf:(19:28687000-28946000) n=259, MaxSoFar=1280] ..........[46000000 Read-Pairs processed] [Time: 2023-02-09 22:35:04] [GenomeSeqContainer Status: buf:(19:35416000-35614000) n=198, MaxSoFar=1280] ..........[47000000 Read-Pairs processed] [Time: 2023-02-09 22:40:30] [GenomeSeqContainer Status: buf:(19:43514000-43858000) n=344, MaxSoFar=1280] ..........[48000000 Read-Pairs processed] [Time: 2023-02-09 22:44:22] [GenomeSeqContainer Status: buf:(19:48443000-48779000) n=336, MaxSoFar=1280] ....Switching to Chromosome: 1 [2023-02-09 22:46:33] ... Skipping chrom "19" in genome fasta... found chrom 1 [2023-02-09 22:46:33] ......[49000000 Read-Pairs processed] [Time: 2023-02-09 22:48:47] [GenomeSeqContainer Status: buf:(1:1578000-2061000) n=483, MaxSoFar=1280] ..........[50000000 Read-Pairs processed] [Time: 2023-02-09 22:52:21] [GenomeSeqContainer Status: buf:(1:11580000-11845000) n=265, MaxSoFar=1280] ..........[51000000 Read-Pairs processed] [Time: 2023-02-09 22:55:56] [GenomeSeqContainer Status: buf:(1:27913000-28146000) n=233, MaxSoFar=1280] ..........[52000000 Read-Pairs processed] [Time: 2023-02-09 23:00:43] [GenomeSeqContainer Status: buf:(1:41077000-41321000) n=244, MaxSoFar=1280] ..........[53000000 Read-Pairs processed] [Time: 2023-02-09 23:08:05] [GenomeSeqContainer Status: buf:(1:51077000-51496000) n=419, MaxSoFar=1280] .....Switching to Chromosome: 20 [2023-02-09 23:10:02] ... Skipping chrom "1" in genome fasta... found chrom 20 [2023-02-09 23:10:02] .....[54000000 Read-Pairs processed] [Time: 2023-02-09 23:11:31] [GenomeSeqContainer Status: buf:(20:7071000-7402000) n=331, MaxSoFar=1280] ..........[55000000 Read-Pairs processed] [Time: 2023-02-09 23:14:51] [GenomeSeqContainer Status: buf:(20:20991000-21880000) n=889, MaxSoFar=1280] ..........[56000000 Read-Pairs processed] [Time: 2023-02-09 23:18:56] [GenomeSeqContainer Status: buf:(20:33938000-34125000) n=187, MaxSoFar=1280] ..........[57000000 Read-Pairs processed] [Time: 2023-02-09 23:22:42] [GenomeSeqContainer Status: buf:(20:46673000-46963000) n=290, MaxSoFar=1280] ..........[58000000 Read-Pairs processed] [Time: 2023-02-09 23:26:09] .Switching to Chromosome: 21 [2023-02-09 23:26:11] ... Skipping chrom "20" in genome fasta... found chrom 21 [2023-02-09 23:26:11] .........[59000000 Read-Pairs processed] [Time: 2023-02-09 23:29:13] [GenomeSeqContainer Status: buf:(21:6140000-6676000) n=536, MaxSoFar=1465] ..........[60000000 Read-Pairs processed] [Time: 2023-02-09 23:41:17] [GenomeSeqContainer Status: buf:(21:22056000-22484000) n=428, MaxSoFar=1465] ..........[61000000 Read-Pairs processed] [Time: 2023-02-10 00:13:32] [GenomeSeqContainer Status: buf:(21:32723000-32980000) n=257, MaxSoFar=1465] ..........[62000000 Read-Pairs processed] [Time: 2023-02-10 00:20:03] [GenomeSeqContainer Status: buf:(21:44554000-45107000) n=553, MaxSoFar=1465] ..Switching to Chromosome: 22 [2023-02-10 00:21:07] ... Skipping chrom "21" in genome fasta... found chrom 22 [2023-02-10 00:21:07] ........[63000000 Read-Pairs processed] [Time: 2023-02-10 00:25:31] [GenomeSeqContainer Status: buf:(22:3726000-4121000) n=395, MaxSoFar=1465] ..........[64000000 Read-Pairs processed] [Time: 2023-02-10 00:28:44] [GenomeSeqContainer Status: buf:(22:18508000-18911000) n=403, MaxSoFar=1465] ..........[65000000 Read-Pairs processed] [Time: 2023-02-10 00:35:06] [GenomeSeqContainer Status: buf:(22:31536000-32126000) n=590, MaxSoFar=1465] ....Switching to Chromosome: 23 [2023-02-10 00:38:06] ... Skipping chrom "22" in genome fasta... found chrom 23 [2023-02-10 00:38:06] ......[66000000 Read-Pairs processed] [Time: 2023-02-10 00:40:24] [GenomeSeqContainer Status: buf:(23:9275000-9557000) n=282, MaxSoFar=1465] ..........[67000000 Read-Pairs processed] [Time: 2023-02-10 00:45:00] [GenomeSeqContainer Status: buf:(23:19733000-19970000) n=237, MaxSoFar=1465] ..........[68000000 Read-Pairs processed] [Time: 2023-02-10 00:51:09] [GenomeSeqContainer Status: buf:(23:25368000-25610000) n=242, MaxSoFar=1465] ..........[69000000 Read-Pairs processed] [Time: 2023-02-10 00:56:20] [GenomeSeqContainer Status: buf:(23:31496000-31716000) n=220, MaxSoFar=1465] ..........[70000000 Read-Pairs processed] [Time: 2023-02-10 01:00:19] [GenomeSeqContainer Status: buf:(23:36199000-36408000) n=209, MaxSoFar=1465] .......Switching to Chromosome: 24 [2023-02-10 01:03:40] ... Skipping chrom "23" in genome fasta... found chrom 24 [2023-02-10 01:03:40] ...[71000000 Read-Pairs processed] [Time: 2023-02-10 01:05:28] [GenomeSeqContainer Status: buf:(24:6960000-7562000) n=602, MaxSoFar=1465] ..........[72000000 Read-Pairs processed] [Time: 2023-02-10 01:12:54] [GenomeSeqContainer Status: buf:(24:21467000-21727000) n=260, MaxSoFar=1465] ..........[73000000 Read-Pairs processed] [Time: 2023-02-10 01:16:23] [GenomeSeqContainer Status: buf:(24:37443000-37727000) n=284, MaxSoFar=1472] ....Switching to Chromosome: 25 [2023-02-10 01:17:50] ... Skipping chrom "24" in genome fasta... found chrom 25 [2023-02-10 01:17:50] ......[74000000 Read-Pairs processed] [Time: 2023-02-10 01:19:49] [GenomeSeqContainer Status: buf:(25:4439000-4668000) n=229, MaxSoFar=1472] ..........[75000000 Read-Pairs processed] [Time: 2023-02-10 01:29:12] [GenomeSeqContainer Status: buf:(25:19265000-19581000) n=316, MaxSoFar=1472] ..........[76000000 Read-Pairs processed] [Time: 2023-02-10 01:36:20] [GenomeSeqContainer Status: buf:(25:36866000-36898000) n=32, MaxSoFar=1472] Switching to Chromosome: 2 [2023-02-10 01:36:31] ... Skipping chrom "25" in genome fasta... found chrom 2 [2023-02-10 01:36:31] ..........[77000000 Read-Pairs processed] [Time: 2023-02-10 01:39:44] [GenomeSeqContainer Status: buf:(2:11120000-11555000) n=435, MaxSoFar=1472] ..........[78000000 Read-Pairs processed] [Time: 2023-02-10 01:43:07] [GenomeSeqContainer Status: buf:(2:26584000-26965000) n=381, MaxSoFar=1472] ..........[79000000 Read-Pairs processed] [Time: 2023-02-10 01:49:42] [GenomeSeqContainer Status: buf:(2:35398000-35869000) n=471, MaxSoFar=1472] ..........[80000000 Read-Pairs processed] [Time: 2023-02-10 01:54:32] [GenomeSeqContainer Status: buf:(2:45793000-46188000) n=395, MaxSoFar=1472] ..........[81000000 Read-Pairs processed] [Time: 2023-02-10 01:59:51] [GenomeSeqContainer Status: buf:(2:58651000-59172000) n=521, MaxSoFar=1472] .Switching to Chromosome: 3 [2023-02-10 02:00:11] ... Skipping chrom "2" in genome fasta... found chrom 3 [2023-02-10 02:00:11] .........[82000000 Read-Pairs processed] [Time: 2023-02-10 02:06:42] [GenomeSeqContainer Status: buf:(3:15067000-15516000) n=449, MaxSoFar=1865] ..........[83000000 Read-Pairs processed] [Time: 2023-02-10 02:13:11] [GenomeSeqContainer Status: buf:(3:18241000-18493000) n=252, MaxSoFar=1865] ..........[84000000 Read-Pairs processed] [Time: 2023-02-10 02:18:24] [GenomeSeqContainer Status: buf:(3:23547000-23843000) n=296, MaxSoFar=1865] ..........[85000000 Read-Pairs processed] [Time: 2023-02-10 02:24:00] [GenomeSeqContainer Status: buf:(3:29727000-29936000) n=209, MaxSoFar=1865] ..........[86000000 Read-Pairs processed] [Time: 2023-02-10 02:27:26] [GenomeSeqContainer Status: buf:(3:32315000-32519000) n=204, MaxSoFar=1865] ..........[87000000 Read-Pairs processed] [Time: 2023-02-10 02:30:50] [GenomeSeqContainer Status: buf:(3:39424000-39692000) n=268, MaxSoFar=1865] ..........[88000000 Read-Pairs processed] [Time: 2023-02-10 02:35:53] [GenomeSeqContainer Status: buf:(3:41774000-41907000) n=133, MaxSoFar=1865] ..........[89000000 Read-Pairs processed] [Time: 2023-02-10 02:39:56] [GenomeSeqContainer Status: buf:(3:54636000-55198000) n=562, MaxSoFar=1865] ......Switching to Chromosome: 4 [2023-02-10 02:41:45] ... Skipping chrom "3" in genome fasta... found chrom 4 [2023-02-10 02:41:45] ....[90000000 Read-Pairs processed] [Time: 2023-02-10 02:43:11] [GenomeSeqContainer Status: buf:(4:978000-1375000) n=397, MaxSoFar=1865] ..........[91000000 Read-Pairs processed] [Time: 2023-02-10 02:47:18] [GenomeSeqContainer Status: buf:(4:14886000-15084000) n=198, MaxSoFar=1865] ..........[92000000 Read-Pairs processed] [Time: 2023-02-10 02:54:33] [GenomeSeqContainer Status: buf:(4:25830000-26034000) n=204, MaxSoFar=1865] .......Switching to Chromosome: 5 [2023-02-10 02:58:32] ... Skipping chrom "4" in genome fasta... found chrom 5 [2023-02-10 02:58:32] . NOTE: Unmatched Read Buffer Size > 800000 [Mem usage:[63GB / 201GB]] NOTE: Unmatched Read Buffer Size > 1600000 [Mem usage:[70GB / 201GB]] ..[93000000 Read-Pairs processed] [Time: 2023-02-10 03:01:19] ..........[94000000 Read-Pairs processed] [Time: 2023-02-10 03:04:08] [GenomeSeqContainer Status: buf:(5:817000-1364000) n=547, MaxSoFar=3118] ..........[95000000 Read-Pairs processed] [Time: 2023-02-10 03:25:41] [GenomeSeqContainer Status: buf:(5:817000-1396000) n=579, MaxSoFar=3118] ..........[96000000 Read-Pairs processed] [Time: 2023-02-10 03:29:06] [GenomeSeqContainer Status: buf:(5:817000-1574000) n=757, MaxSoFar=3118] ..........[97000000 Read-Pairs processed] [Time: 2023-02-10 03:33:27] [GenomeSeqContainer Status: buf:(5:4031000-4460000) n=429, MaxSoFar=3118] ..........[98000000 Read-Pairs processed] [Time: 2023-02-10 03:37:52] [GenomeSeqContainer Status: buf:(5:21996000-22198000) n=202, MaxSoFar=3118] ..........[99000000 Read-Pairs processed] [Time: 2023-02-10 03:41:53] [GenomeSeqContainer Status: buf:(5:22763000-23359000) n=596, MaxSoFar=3118] ..........[100000000 Read-Pairs processed] [Time: 2023-02-10 03:45:16] [GenomeSeqContainer Status: buf:(5:28989000-29478000) n=489, MaxSoFar=3118] ..........[101000000 Read-Pairs processed] [Time: 2023-02-10 03:51:01] [GenomeSeqContainer Status: buf:(5:34381000-34562000) n=181, MaxSoFar=3118] ..........[102000000 Read-Pairs processed] [Time: 2023-02-10 03:54:57] [GenomeSeqContainer Status: buf:(5:43072000-43655000) n=583, MaxSoFar=3118] ..........[103000000 Read-Pairs processed] [Time: 2023-02-10 04:00:01] [GenomeSeqContainer Status: buf:(5:58171000-58420000) n=249, MaxSoFar=3118] .......Switching to Chromosome: 6 [2023-02-10 04:04:12] ... Skipping chrom "5" in genome fasta... found chrom 6 [2023-02-10 04:04:12] ...[104000000 Read-Pairs processed] [Time: 2023-02-10 04:05:08] [GenomeSeqContainer Status: buf:(6:4735000-5174000) n=439, MaxSoFar=3118] ..........[105000000 Read-Pairs processed] [Time: 2023-02-10 04:08:28] [GenomeSeqContainer Status: buf:(6:9770000-10252000) n=482, MaxSoFar=3118] ..........[106000000 Read-Pairs processed] [Time: 2023-02-10 04:12:48] [GenomeSeqContainer Status: buf:(6:21883000-22136000) n=253, MaxSoFar=3118] ..........[107000000 Read-Pairs processed] [Time: 2023-02-10 04:20:59] [GenomeSeqContainer Status: buf:(6:37348000-37584000) n=236, MaxSoFar=3118] ..........[108000000 Read-Pairs processed] [Time: 2023-02-10 04:28:39] [GenomeSeqContainer Status: buf:(6:49515000-50650000) n=1135, MaxSoFar=3118] ........Switching to Chromosome: 7 [2023-02-10 04:36:25] ... Skipping chrom "6" in genome fasta... found chrom 7 [2023-02-10 04:36:25] ..[109000000 Read-Pairs processed] [Time: 2023-02-10 04:36:55] [GenomeSeqContainer Status: buf:(7:3976000-5047000) n=1071, MaxSoFar=3118] ..........[110000000 Read-Pairs processed] [Time: 2023-02-10 04:40:18] [GenomeSeqContainer Status: buf:(7:21605000-21862000) n=257, MaxSoFar=3118] ..........[111000000 Read-Pairs processed] [Time: 2023-02-10 04:43:45] [GenomeSeqContainer Status: buf:(7:29671000-30221000) n=550, MaxSoFar=3118] ..........[112000000 Read-Pairs processed] [Time: 2023-02-10 04:52:26] [GenomeSeqContainer Status: buf:(7:38415000-38802000) n=387, MaxSoFar=3118] ..........[113000000 Read-Pairs processed] [Time: 2023-02-10 04:57:05] [GenomeSeqContainer Status: buf:(7:41515000-41721000) n=206, MaxSoFar=3118] ..........[114000000 Read-Pairs processed] [Time: 2023-02-10 05:07:00] [GenomeSeqContainer Status: buf:(7:54016000-54330000) n=314, MaxSoFar=3118] ..........[115000000 Read-Pairs processed] [Time: 2023-02-10 05:14:26] [GenomeSeqContainer Status: buf:(7:64551000-65108000) n=557, MaxSoFar=3118] ......Switching to Chromosome: 8 [2023-02-10 05:20:02] ... Skipping chrom "7" in genome fasta... found chrom 8 [2023-02-10 05:20:02] ....[116000000 Read-Pairs processed] [Time: 2023-02-10 05:22:24] [GenomeSeqContainer Status: buf:(8:2433000-2891000) n=458, MaxSoFar=3118] ..........[117000000 Read-Pairs processed] [Time: 2023-02-10 05:39:28] [GenomeSeqContainer Status: buf:(8:21053000-21221000) n=168, MaxSoFar=3118] ..........[118000000 Read-Pairs processed] [Time: 2023-02-10 05:46:48] [GenomeSeqContainer Status: buf:(8:31349000-31963000) n=614, MaxSoFar=3118] ..........[119000000 Read-Pairs processed] [Time: 2023-02-10 05:52:30] [GenomeSeqContainer Status: buf:(8:48975000-49194000) n=219, MaxSoFar=3118] ......Switching to Chromosome: 9 [2023-02-10 05:54:29] ... Skipping chrom "8" in genome fasta... found chrom 9 [2023-02-10 05:54:29] ....[120000000 Read-Pairs processed] [Time: 2023-02-10 05:55:49] [GenomeSeqContainer Status: buf:(9:6384000-6712000) n=328, MaxSoFar=3118] ..........[121000000 Read-Pairs processed] [Time: 2023-02-10 05:59:18] [GenomeSeqContainer Status: buf:(9:18211000-18333000) n=122, MaxSoFar=3118] ..........[122000000 Read-Pairs processed] [Time: 2023-02-10 06:07:33] [GenomeSeqContainer Status: buf:(9:33504000-33753000) n=249, MaxSoFar=3118] ..........[123000000 Read-Pairs processed] [Time: 2023-02-10 06:16:16] [GenomeSeqContainer Status: buf:(9:48878000-49143000) n=265, MaxSoFar=3118] .......Switching to Chromosome: MT [2023-02-10 06:21:26] ... Skipping chrom "9" in genome fasta... found chrom MT [2023-02-10 06:21:26] NOTE: Unmatched Read Buffer Size > 3200000 [Mem usage:[79GB / 195GB]] NOTE: Unmatched Read Buffer Size > 6400000 [Mem usage:[93GB / 195GB]] NOTE: Unmatched Read Buffer Size > 12800000 [Mem usage:[117GB / 195GB]] NOTE: Unmatched Read Buffer Size > 25600000 [Mem usage:[169GB / 195GB]] Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00002b6810b00000, 524288, 0) failed; error='Cannot allocate memory' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 524288 bytes for committing reserved memory. # An error report file with more information is saved as: # /crex/proj/project/nobackup/nbis/data/processed/zumis/qorts/hs_err_pid1748.log ```
BAM preview ``` A01901:60:H37HJDRX2:1:1273:24858:4899 163 10 729 3 17M95932N101M = 152190 151507 CACACACACACACAGAGACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAGACACACA FFFFFFFFFFFFFF:FFFFFFFFFF:FFFFF:F,F,F,FFFFFFFFF:FFFFF,FFFFFFFFFFFF,F::FFFFFFFFFF,FFFFFF:FF,F:F,:,,FF,FFFF,F,:F,:FFF:F: NH:i:2 HI:i:1 AS:i:156 nM:i:2 BX:Z:TGTATCCGAACCATGTTGCA BC:Z:TGTATCCGAACCATGTTGCA QB:Z:FFFFFFFFFF:FFFFFFFFF QU:Z:FFFFFFFF ES:Z:Unassigned_NoFeatures IS:Z:Assigned3 IN:i:1 GI:Z:ENSDARG00000086075 UX:Z:GCAGAACC UB:Z:GCAGAACC A01901:60:H37HJDRX2:2:2241:28673:9251 163 10 733 255 13M94039N105M = 190529 189832 CACACACACAGAGACACACGCACGCACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACACGCAGGCACGCACACACAAAATCAGACA FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F,FFFFFFFFFFFFFFFFFF,F,,FFFFFF,,F,,FF,,,,,,,:F,,:F:F,:,:F,,,: NH:i:1 HI:i:1 AS:i:134 nM:i:8 BX:Z:TTCGTTGTACTTCACCTGTG BC:Z:TTCGTTGTACTTCACCTGTG QB:Z:FFFFFFFFFFFFFFFFFFFF QU:Z: ES:Z:Assigned3 EN:i:1 GE:Z:ENSDARG00000103980 IS:Z:Assigned3 IN:i:1 GI:Z:ENSDARG00000086075 UX:Z: UB:Z: A01901:60:H37HJDRX2:2:1101:30047:9721 163 10 735 3 11M95900N107M = 152192 151501 CACACACAGAGACACACACACACACACACACACACACACACACACACACACACACACACAGACACACACACACACACACACACACACACACACACACACACACACACACACACACACA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF,F,FFFFFFFFFFF,FFFF:FFFF,FFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFF NH:i:2 HI:i:1 AS:i:156 nM:i:1 BX:Z:TGTATCCGAACCATGTTGCA BC:Z:TGTATCCGAACCATGTTGCA QB:Z:FFFFFFFFFFFFFFFFFFFF QU:Z:FFFFFFFF ES:Z:Unassigned_NoFeatures IS:Z:Assigned3 IN:i:1 GI:Z:ENSDARG00000086075 UX:Z:GCAGAACC UB:Z:GCAGAACC A01901:60:H37HJDRX2:2:2259:16459:10864 163 10 735 3 11M95896N107M = 152188 151501 CACACACAGAGACACACACACACACACACACACACACACACACACACACACACACACACACACAGACACACACACACACACACACACACACACACACACACACACACACACACACACA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,F,F::FFFFFFFF,F:FFFFFFF:F,FFFFFFFFFFFFFFFF:,FFFFFFF:FFF NH:i:2 HI:i:1 AS:i:160 nM:i:1 BX:Z:TGTATCCGAACCATGTTGCA BC:Z:TGTATCCGAACCATGTTGCA QB:Z:FFFFFFFFFFFFFFFFFFFF QU:Z:FFFFFFFF ES:Z:Unassigned_NoFeatures IS:Z:Assigned3 IN:i:1 GI:Z:ENSDARG00000086075 UX:Z:GCAGAACC UB:Z:GCAGAACC A01901:60:H37HJDRX2:2:1115:18511:3583 163 10 863 255 76M42S = 82408 81776 CTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGGGTCATCTGGCGGTGTGTGTTCTGAGTTGTCTGCAGCGCAGCAGG FFFFF,FFF:F,FFF:FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFF,,:,F,,:,,F,:FF:FF:,F:FF,F,::FFF,,,,,,F,,FF,, NH:i:1 HI:i:1 AS:i:155 nM:i:1 BX:Z:GGTCGTGATTTTGGTCAGTT BC:Z:GGTCGTGATTTTGGTCAGTT QB:Z:FF:FF:FFFF,FFFFFFFFF QU:Z: ES:Z:Assigned3 EN:i:1 GE:Z:ENSDARG00000086075 IS:Z:Unassigned_NoFeatures UX:Z: UB:Z: A01901:60:H37HJDRX2:2:2271:22001:14857 99 10 863 255 17S68M = 82624 82022 CTGTCACAGTGGTGTCACTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT FFFFFFF:FFFF:FFFF:FFFF,F,FFF,FFF,:F:,FFFFFFFFF,F:FFFFFFFFFFF:F:F:F,FFFFF,FFF,F:FFFF:, NH:i:1 HI:i:1 AS:i:184 nM:i:0 BX:Z:GAGCGCCTATTACGTAATCG BC:Z:GAGCGCCTATTACGTAATCG QB:Z:FFFFFFFFFFF,FFFFFFFF QU:Z: ES:Z:Assigned3 EN:i:1 GE:Z:ENSDARG00000086075 IS:Z:Unassigned_NoFeatures UX:Z: UB:Z: A01901:60:H37HJDRX2:1:2106:9082:32346 99 10 887 255 1S84M = 82308 81687 GGTGTCACTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGA FFFFFFFFFFFFFFFF:F,FFFFFF:FFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFF,,,, NH:i:1 HI:i:1 AS:i:192 nM:i:4 BX:Z:GGTCGTGATTTTGGTCAGTT BC:Z:GGTCGTGATTTTGGTCAGTT QB:Z:FFFFF,F:FF:FFFFF,FFF QU:Z: ES:Z:Assigned3 EN:i:1 GE:Z:ENSDARG00000086075 IS:Z:Unassigned_NoFeatures UX:Z: UB:Z: A01901:60:H37HJDRX2:1:2208:11731:15515 99 10 887 255 85M = 82845 82226 GTGTCACTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAG FFFFFF,FF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFF:FFFFF:,,F,F, NH:i:1 HI:i:1 AS:i:186 nM:i:7 BX:Z:GGTCGTGATTTTGGTCAGTT BC:Z:GGTCGTGATTTTGGTCAGTT QB:Z:FFFFF:F::FFFFFFF,FFF QU:Z: ES:Z:Assigned3 EN:i:1 GE:Z:ENSDARG00000086075 IS:Z:Unassigned_NoFeatures UX:Z: UB:Z: A01901:60:H37HJDRX2:2:2234:23140:32142 163 10 893 3 79M333463N23M16S = 391754 390923 GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGTGTCTCACACACACACACAGAAAAAAATCTCTCAAAAAA FFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF,FFFF,FFFF:FFFFF:F:,F:F,,:,,:,::,,,,:F,F,F,,,:,F,FFF,,,,,,,F,:F, NH:i:2 HI:i:1 AS:i:151 nM:i:0 BX:Z:CTATAACCGTTTGGTTCCAA BC:Z:CTATAACCGTTTGGTTCCAA QB:Z:FFFFFFFFFFFFFFFFFFFF QU:Z:FFFFFFFF ES:Z:Unassigned_NoFeatures IS:Z:Assigned3 IN:i:1 GI:Z:ENSDARG00000087585 UX:Z:AGGGAGGC UB:Z:AGGGAGGC A01901:60:H37HJDRX2:1:1250:10050:26882 99 10 913 255 63M = 217626 216786 GTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGAGTGAT F,FFFFF:FFFFFFFFF:FFFFFFFFF,FFFFF,FFFFFFFFFFFFF:F:FF,F::F,F,:,, NH:i:1 HI:i:1 AS:i:124 nM:i:4 BX:Z:CCACTTCCATCCGTTAACAA BC:Z:CCACTTCCATCCGTTAACAA QB:Z:FFFF:FFFFFFFFFFFFFFF QU:Z:FFFFFFFF ES:Z:Unassigned_NoFeatures IS:Z:Assigned3 IN:i:1 GI:Z:ENSDARG00000059048 UX:Z:GTTGGCTG UB:Z:GTTGGCTG ```
hartleys commented 1 year ago

This happens when you have either (a) a large number of read-pairs that are extremely long distances apart, or (b) EXTREMELY high read density. Basically, as it parses the BAM file it keeps a running buffer of the read pairs that have not yet been matched. This works fine up until you have a huge number of reads in between any one read and its mate.

It looks like it does Ok right up to the very end where it suddenly can't find any matches and just keeps loading more reads. What does the end of your BAM file look like? Do you have unmapped reads there or a huge number of reads that map to loose contigs or something? Do you have enormous numbers of reads on the MT chromosome? Are you studying a tissue with a ton of mitochondrial expression maybe?

I'm not terribly surprised that downsampling causes the same issue. If you randomly downsample without making sure to keep paired reads matched up, then basically all your reads become pairless and QoRTs will try to read the entire uncompressed file into memory trying to find the missing reads.

What happens if you feed it one complete chromosome? So like:

samtools view -h sample.bam 1 > sample.chr1.bam

Or maybe even handing it everything except MT?

royfrancis commented 1 year ago

Thank you for you reply and insights. I don't really know much about this BAM, where reads are mapping to etc... which is why I am running QC on it :D

The randomly downsampled BAM with 125M read pairs finally did manage to complete when given 512GB of RAM o_O

snowy-snic2022-22-328-royfranc-7319726-1

Resource usage during the run.

Complete run output ``` Starting QoRTs v1.3.6 (Compiled Tue Sep 25 11:21:46 EDT 2018) Starting time: (Thu Feb 16 10:58:06 CET 2023) INPUT_COMMAND(QC) INPUT_ARG(infile)=sample-sub.bam INPUT_ARG(gtffile)=/crex/proj/project/nobackup/nbis/data/processed/zumis/03dpf/03dpf.final_annot.gtf INPUT_ARG(outdir)=sample-sub-qorts INPUT_ARG(genomeFA)=Some(List(/crex/proj/project/nobackup/nbis/data/reference/grcz10-custom/genome.fa)) INPUT_ARG(flatgfffile)=Some(/crex/proj/project/nobackup/nbis/data/processed/zumis/qorts/genes-flat.gff) INPUT_ARG(isRNASeq)=true INPUT_ARG(noGzipOutput)=true INPUT_ARG(verbose)=true INPUT_ARG(maxReadLength)=Some(125) Created Log File: sample-sub-qorts/QC.FTnRrt5rbVMr.log Warning: run-in-progress file "sample-sub-qorts/QC.QORTS_RUNNING" already exists. Is there another QoRTs job running? Starting QC [Time: 2023-02-16 10:58:06] [Mem usage: [75MB / 2058MB]] [Elapsed Time: 00:00:00.0000] QoRTs is Running in paired-end mode. QoRTs is Running in any-sorted mode. Parameter --genomeFA found. Adding reference mismatch testing. NOTE: Function "overlapMatch" requires function "mismatchEngine". Adding "mismatchEngine" to the active function list... Running functions: CigarOpDistribution, GCDistribution, GeneCalcs, InsertSize, JunctionCalcs, NVC, QualityScoreDistribution, StrandCheck, chromCounts, cigarLocusCounts, mismatchEngine, overlapMatch, readLengthDistro, referenceMatch, writeBiotypeCounts, writeClippedNVC, writeDESeq, writeDEXSeq, writeGeneBody, writeGeneCounts, writeGenewiseGeneBody, writeJunctionSeqCounts, writeKnownSplices, writeNovelSplices, writeSpliceExon Checking first 10000 reads. Checking SAM file for formatting errors... Stats on the first 10000 reads: Num Reads Primary Map: 10000 Num Reads Paired-ended: 10000 Num Reads mapped pair: 9995 Num Pair names found: 5272 Num Pairs matched: 4723 Read Seq length: 63 to 118 Unclipped Read length: 63 to 118 Final maxReadLength: 125 maxPhredScore: 37 minPhredScore: 2 NOTE: Read length is not consistent. In the first 10000 reads, read length varies from 63 to 118 (param maxReadLength=125) Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information. Note: Data appears to be paired-ended. Sorting Note: Reads are not sorted by name (This is OK). Sorting Note: Reads are sorted by position (This is OK). Done checking first 10000 reads. WARNINGS FOUND! Starting getSRPairIterResorted... SAMRecord Reader Generated. Read length: 125. [Time: 2023-02-16 10:58:11] [Mem usage: [731MB / 2595MB]] [Elapsed Time: 00:00:04.0795] > Init GeneCalcs Utility > Init InsertSize Utility > Init NVC utility > Init CigarOpDistribution Utility > Init QualityScoreDistribution Utility > Init GC counts Utility > Init JunctionCalcs utility length of knownSpliceMap after instantiation: 256778 length of knownCountMap after instantiation: 256778 > Init StrandCheck Utility > Init chromCount Utility > Init qcCigarLocusCounts Utility > Init OverlapMatch Utility > Init MinorUtils Utility QC Utilities Generated! [Time: 2023-02-16 10:59:49] [Mem usage: [5GB / 12GB]] [Elapsed Time: 00:01:43.0188] helper_calculateGeneAssignmentMap_strict. Found: 31956 genes in the supplied annotation. helper_calculateGeneAssignmentMap_strict. Found: 4912 genes with ambiguous segments. helper_calculateGeneAssignmentMap_strict. Found: 27044 genes after first-pass filtering making makeGeneIntervalMap for geneBody calculations. Found: 27044 acceptable genes for gene-body analysis. ..........[1000000 Read-Pairs processed] [Time: 2023-02-16 11:04:22] [GenomeSeqContainer Status: buf:(10:22878000-23066000) n=188, MaxSoFar=895] .. NOTE: Unsorted Read-PAIR-Buffer Size > 100000 [Mem usage:[23GB / 29GB]] Currently searching for read: A01901:60:H37HJDRX2:2:2261:24912:3176 for 98140 iterations. Searching for read: A01901:60:H37HJDRX2:2:2261:24912:3176 10:29443811-29443858 99 Current unmatched-pair-buffer status: 3898 (This is generally not a problem, but if this increases further then OutOfMemoryExceptions may occur. If memory errors do occur, either increase memory allocation or sort the bam-file by name and rerun with the '--nameSorted' option. This might also indicate that your dataset contains an unusually large number of chimeric read-pairs. Or it could occur simply due to the presence of genomic loci with extremly high coverage or complex splicing. It may also indicate a SAM/BAM file that does not adhere to the standard SAM specification.) ........[2000000 Read-Pairs processed] [Time: 2023-02-16 11:08:01] [GenomeSeqContainer Status: buf:(10:42536000-43137000) n=601, MaxSoFar=895] ...Switching to Chromosome: 11 [2023-02-16 11:09:37] ... Skipping chrom "10" in genome fasta... found chrom 11 [2023-02-16 11:09:37] .......[3000000 Read-Pairs processed] [Time: 2023-02-16 11:12:16] [GenomeSeqContainer Status: buf:(11:10585000-11164000) n=579, MaxSoFar=895] ..........[4000000 Read-Pairs processed] [Time: 2023-02-16 11:15:47] [GenomeSeqContainer Status: buf:(11:35965000-36266000) n=301, MaxSoFar=895] ..... NOTE: Unmatched Read Buffer Size > 100000 [Mem usage:[3691MB / 28GB]] (This is generally not a problem, but if this increases further then OutOfMemoryExceptions may occur. If memory errors do occur, either increase memory allocation or sort the bam-file by name and rerun with the '--nameSorted' option. This might also indicate that your dataset contains an unusually large number of chimeric read-pairs. Or it could occur simply due to the presence of genomic loci with extremly high coverage. It may also indicate a SAM/BAM file that does not adhere to the standard SAM specification.) NOTE: Unsorted Read-PAIR-Buffer Size > 200000 [Mem usage:[4692MB / 28GB]] Currently searching for read: A01901:60:H37HJDRX2:1:1117:7473:19633 for 167820 iterations. Searching for read: A01901:60:H37HJDRX2:1:1117:7473:19633 11:44043229-44043346 163 Current unmatched-pair-buffer status: 68944 ...Switching to Chromosome: 12 [2023-02-16 11:18:56] ... Skipping chrom "11" in genome fasta... found chrom 12 [2023-02-16 11:18:56] ..[5000000 Read-Pairs processed] [Time: 2023-02-16 11:19:24] [GenomeSeqContainer Status: buf:(12:3273000-3637000) n=364, MaxSoFar=895] ..........[6000000 Read-Pairs processed] [Time: 2023-02-16 11:23:09] [GenomeSeqContainer Status: buf:(12:26537000-26792000) n=255, MaxSoFar=1015] .........Switching to Chromosome: 13 [2023-02-16 11:26:24] ... Skipping chrom "12" in genome fasta... found chrom 13 [2023-02-16 11:26:24] .[7000000 Read-Pairs processed] [Time: 2023-02-16 11:26:38] [GenomeSeqContainer Status: buf:(13:353000-570000) n=217, MaxSoFar=1015] ..........[8000000 Read-Pairs processed] [Time: 2023-02-16 11:30:10] [GenomeSeqContainer Status: buf:(13:28167000-28222000) n=55, MaxSoFar=1015] ........Switching to Chromosome: 14 [2023-02-16 11:33:14] ... Skipping chrom "13" in genome fasta... found chrom 14 [2023-02-16 11:33:14] ..[9000000 Read-Pairs processed] [Time: 2023-02-16 11:33:37] [GenomeSeqContainer Status: buf:(14:2038000-2676000) n=638, MaxSoFar=1015] ..........[10000000 Read-Pairs processed] [Time: 2023-02-16 11:37:16] [GenomeSeqContainer Status: buf:(14:20852000-20942000) n=90, MaxSoFar=1015] ..........[11000000 Read-Pairs processed] [Time: 2023-02-16 11:40:50] [GenomeSeqContainer Status: buf:(14:32694000-33088000) n=394, MaxSoFar=1015] ...... NOTE: Unmatched Read Buffer Size > 200000 [Mem usage:[7GB / 40GB]] NOTE: Unsorted Read-PAIR-Buffer Size > 400000 [Mem usage:[8GB / 40GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2204:10384:7952 for 374755 iterations. Searching for read: A01901:60:H37HJDRX2:1:2204:10384:7952 14:46035424-46647919 163 Current unmatched-pair-buffer status: 363660 NOTE: Unsorted Read-PAIR-Buffer Size > 800000 [Mem usage:[12GB / 40GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2204:10384:7952 for 774755 iterations. Searching for read: A01901:60:H37HJDRX2:1:2204:10384:7952 14:46035424-46647919 163 Current unmatched-pair-buffer status: 338363 NOTE: Unsorted Read-PAIR-Buffer Size > 1600000 [Mem usage:[17GB / 40GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2109:23194:2895 for 687930 iterations. Searching for read: A01901:60:H37HJDRX2:1:2109:23194:2895 14:46090552-46651262 163 Current unmatched-pair-buffer status: 50659 NOTE: Unmatched Read Buffer Size > 400000 [Mem usage:[20GB / 40GB]] ....[12000000 Read-Pairs processed] [Time: 2023-02-16 11:45:16] [GenomeSeqContainer Status: buf:(14:46637000-47216000) n=579, MaxSoFar=1143] ..........[13000000 Read-Pairs processed] [Time: 2023-02-16 11:48:54] [GenomeSeqContainer Status: buf:(14:46638000-47216000) n=578, MaxSoFar=1143] ..........[14000000 Read-Pairs processed] [Time: 2023-02-16 11:52:26] [GenomeSeqContainer Status: buf:(14:46641000-47216000) n=575, MaxSoFar=1143] ...Switching to Chromosome: 15 [2023-02-16 11:54:04] ... Skipping chrom "14" in genome fasta... found chrom 15 [2023-02-16 11:54:04] .......[15000000 Read-Pairs processed] [Time: 2023-02-16 11:56:22] [GenomeSeqContainer Status: buf:(15:20530000-20553000) n=23, MaxSoFar=1143] ..........[16000000 Read-Pairs processed] [Time: 2023-02-16 12:00:12] [GenomeSeqContainer Status: buf:(15:41730000-42114000) n=384, MaxSoFar=1143] ...Switching to Chromosome: 16 [2023-02-16 12:01:17] ... Skipping chrom "15" in genome fasta... found chrom 16 [2023-02-16 12:01:17] .......[17000000 Read-Pairs processed] [Time: 2023-02-16 12:04:00] [GenomeSeqContainer Status: buf:(16:6871000-7286000) n=415, MaxSoFar=1280] ..........[18000000 Read-Pairs processed] [Time: 2023-02-16 12:07:56] [GenomeSeqContainer Status: buf:(16:24269000-24536000) n=267, MaxSoFar=1280] ..........[19000000 Read-Pairs processed] [Time: 2023-02-16 12:11:46] [GenomeSeqContainer Status: buf:(16:38980000-39499000) n=519, MaxSoFar=1280] .........Switching to Chromosome: 17 [2023-02-16 12:15:34] ... Skipping chrom "16" in genome fasta... found chrom 17 [2023-02-16 12:15:34] .[20000000 Read-Pairs processed] [Time: 2023-02-16 12:15:44] [GenomeSeqContainer Status: buf:(17:107000-937000) n=830, MaxSoFar=1280] ..........[21000000 Read-Pairs processed] [Time: 2023-02-16 12:19:39] ..........[22000000 Read-Pairs processed] [Time: 2023-02-16 12:23:23] [GenomeSeqContainer Status: buf:(17:49873000-50262000) n=389, MaxSoFar=1280] .Switching to Chromosome: 18 [2023-02-16 12:23:56] ... Skipping chrom "17" in genome fasta... found chrom 18 [2023-02-16 12:23:56] .........[23000000 Read-Pairs processed] [Time: 2023-02-16 12:27:16] [GenomeSeqContainer Status: buf:(18:16146000-16736000) n=590, MaxSoFar=1280] ..........[24000000 Read-Pairs processed] [Time: 2023-02-16 12:31:00] [GenomeSeqContainer Status: buf:(18:44876000-45135000) n=259, MaxSoFar=1280] ...Switching to Chromosome: 19 [2023-02-16 12:32:11] ... Skipping chrom "18" in genome fasta... found chrom 19 [2023-02-16 12:32:11] .......[25000000 Read-Pairs processed] [Time: 2023-02-16 12:34:53] [GenomeSeqContainer Status: buf:(19:8818000-9129000) n=311, MaxSoFar=1280] ..........[26000000 Read-Pairs processed] [Time: 2023-02-16 12:38:59] [GenomeSeqContainer Status: buf:(19:22220000-22533000) n=313, MaxSoFar=1280] ..........[27000000 Read-Pairs processed] [Time: 2023-02-16 12:43:04] [GenomeSeqContainer Status: buf:(19:28689000-28946000) n=257, MaxSoFar=1280] ..........[28000000 Read-Pairs processed] [Time: 2023-02-16 12:47:01] [GenomeSeqContainer Status: buf:(19:43513000-43858000) n=345, MaxSoFar=1280] ..........[29000000 Read-Pairs processed] [Time: 2023-02-16 12:50:57] [GenomeSeqContainer Status: buf:(19:48449000-48779000) n=330, MaxSoFar=1280] Switching to Chromosome: 1 [2023-02-16 12:51:17] ... Skipping chrom "19" in genome fasta... found chrom 1 [2023-02-16 12:51:17] ..........[30000000 Read-Pairs processed] [Time: 2023-02-16 12:54:42] [GenomeSeqContainer Status: buf:(1:11639000-11916000) n=277, MaxSoFar=1280] ..........[31000000 Read-Pairs processed] [Time: 2023-02-16 12:58:30] [GenomeSeqContainer Status: buf:(1:37922000-38207000) n=285, MaxSoFar=1280] ..........[32000000 Read-Pairs processed] [Time: 2023-02-16 13:02:19] [GenomeSeqContainer Status: buf:(1:54481000-54717000) n=236, MaxSoFar=1280] .Switching to Chromosome: 20 [2023-02-16 13:02:52] ... Skipping chrom "1" in genome fasta... found chrom 20 [2023-02-16 13:02:52] .........[33000000 Read-Pairs processed] [Time: 2023-02-16 13:06:07] [GenomeSeqContainer Status: buf:(20:21020000-21880000) n=860, MaxSoFar=1280] ..........[34000000 Read-Pairs processed] [Time: 2023-02-16 13:09:55] [GenomeSeqContainer Status: buf:(20:43877000-44381000) n=504, MaxSoFar=1280] ........Switching to Chromosome: 21 [2023-02-16 13:13:04] ... Skipping chrom "20" in genome fasta... found chrom 21 [2023-02-16 13:13:04] ..[35000000 Read-Pairs processed] [Time: 2023-02-16 13:13:27] [GenomeSeqContainer Status: buf:(21:1313000-2726000) n=1413, MaxSoFar=1465] ..........[36000000 Read-Pairs processed] [Time: 2023-02-16 13:17:18] [GenomeSeqContainer Status: buf:(21:22275000-22543000) n=268, MaxSoFar=1465] ..........[37000000 Read-Pairs processed] [Time: 2023-02-16 13:21:07] [GenomeSeqContainer Status: buf:(21:39576000-39784000) n=208, MaxSoFar=1465] ...Switching to Chromosome: 22 [2023-02-16 13:22:26] ... Skipping chrom "21" in genome fasta... found chrom 22 [2023-02-16 13:22:26] .......[38000000 Read-Pairs processed] [Time: 2023-02-16 13:24:55] [GenomeSeqContainer Status: buf:(22:10363000-10569000) n=206, MaxSoFar=1465] ..........[39000000 Read-Pairs processed] [Time: 2023-02-16 13:28:48] [GenomeSeqContainer Status: buf:(22:31702000-32081000) n=379, MaxSoFar=1465] ..Switching to Chromosome: 23 [2023-02-16 13:29:42] ... Skipping chrom "22" in genome fasta... found chrom 23 [2023-02-16 13:29:42] ........[40000000 Read-Pairs processed] [Time: 2023-02-16 13:32:38] [GenomeSeqContainer Status: buf:(23:17493000-17920000) n=427, MaxSoFar=1465] ..........[41000000 Read-Pairs processed] [Time: 2023-02-16 13:36:30] [GenomeSeqContainer Status: buf:(23:26604000-26866000) n=262, MaxSoFar=1465] ..........[42000000 Read-Pairs processed] [Time: 2023-02-16 13:40:03] [GenomeSeqContainer Status: buf:(23:36199000-36408000) n=209, MaxSoFar=1465] ....Switching to Chromosome: 24 [2023-02-16 13:41:46] ... Skipping chrom "23" in genome fasta... found chrom 24 [2023-02-16 13:41:46] ......[43000000 Read-Pairs processed] [Time: 2023-02-16 13:43:50] [GenomeSeqContainer Status: buf:(24:18406000-18482000) n=76, MaxSoFar=1465] ..........[44000000 Read-Pairs processed] [Time: 2023-02-16 13:47:45] [GenomeSeqContainer Status: buf:(24:40941000-41557000) n=616, MaxSoFar=1465] Switching to Chromosome: 25 [2023-02-16 13:47:56] ... Skipping chrom "24" in genome fasta... found chrom 25 [2023-02-16 13:47:56] ..........[45000000 Read-Pairs processed] [Time: 2023-02-16 13:51:36] [GenomeSeqContainer Status: buf:(25:19263000-19381000) n=118, MaxSoFar=1465] ......Switching to Chromosome: 2 [2023-02-16 13:53:59] ... Skipping chrom "25" in genome fasta... found chrom 2 [2023-02-16 13:53:59] ....[46000000 Read-Pairs processed] [Time: 2023-02-16 13:55:24] [GenomeSeqContainer Status: buf:(2:9982000-10172000) n=190, MaxSoFar=1465] ..........[47000000 Read-Pairs processed] [Time: 2023-02-16 13:59:19] [GenomeSeqContainer Status: buf:(2:26953000-27126000) n=173, MaxSoFar=1465] ..........[48000000 Read-Pairs processed] [Time: 2023-02-16 14:03:09] [GenomeSeqContainer Status: buf:(2:45628000-46188000) n=560, MaxSoFar=1465] ......Switching to Chromosome: 3 [2023-02-16 14:05:36] ... Skipping chrom "2" in genome fasta... found chrom 3 [2023-02-16 14:05:36] ....[49000000 Read-Pairs processed] [Time: 2023-02-16 14:06:49] [GenomeSeqContainer Status: buf:(3:7759000-8537000) n=778, MaxSoFar=1865] ..........[50000000 Read-Pairs processed] [Time: 2023-02-16 14:10:53] [GenomeSeqContainer Status: buf:(3:20797000-21282000) n=485, MaxSoFar=1865] ..........[51000000 Read-Pairs processed] [Time: 2023-02-16 14:14:46] [GenomeSeqContainer Status: buf:(3:29727000-29873000) n=146, MaxSoFar=1865] ..........[52000000 Read-Pairs processed] [Time: 2023-02-16 14:18:43] [GenomeSeqContainer Status: buf:(3:35987000-36253000) n=266, MaxSoFar=1865] ..........[53000000 Read-Pairs processed] [Time: 2023-02-16 14:22:27] [GenomeSeqContainer Status: buf:(3:48839000-49061000) n=222, MaxSoFar=1865] .......Switching to Chromosome: 4 [2023-02-16 14:25:16] ... Skipping chrom "3" in genome fasta... found chrom 4 [2023-02-16 14:25:16] ...[54000000 Read-Pairs processed] [Time: 2023-02-16 14:26:10] [GenomeSeqContainer Status: buf:(4:963000-1375000) n=412, MaxSoFar=1865] ..........[55000000 Read-Pairs processed] [Time: 2023-02-16 14:29:58] [GenomeSeqContainer Status: buf:(4:21918000-22076000) n=158, MaxSoFar=1865] ......Switching to Chromosome: 5 [2023-02-16 14:31:51] ... Skipping chrom "4" in genome fasta... found chrom 5 [2023-02-16 14:31:51] NOTE: Unmatched Read Buffer Size > 800000 [Mem usage:[37GB / 170GB]] ....[56000000 Read-Pairs processed] [Time: 2023-02-16 14:32:28] [GenomeSeqContainer Status: buf:(5:816000-1083000) n=267, MaxSoFar=3118] ..........[57000000 Read-Pairs processed] [Time: 2023-02-16 14:35:34] [GenomeSeqContainer Status: buf:(5:817000-1396000) n=579, MaxSoFar=3118] ..........[58000000 Read-Pairs processed] [Time: 2023-02-16 14:38:54] [GenomeSeqContainer Status: buf:(5:1680000-2529000) n=849, MaxSoFar=3118] ..........[59000000 Read-Pairs processed] [Time: 2023-02-16 14:42:53] [GenomeSeqContainer Status: buf:(5:22754000-23242000) n=488, MaxSoFar=3118] ..........[60000000 Read-Pairs processed] [Time: 2023-02-16 14:46:48] [GenomeSeqContainer Status: buf:(5:28892000-29478000) n=586, MaxSoFar=3118] ..........[61000000 Read-Pairs processed] [Time: 2023-02-16 14:50:39] [GenomeSeqContainer Status: buf:(5:38777000-39137000) n=360, MaxSoFar=3118] ..........[62000000 Read-Pairs processed] [Time: 2023-02-16 14:53:59] [GenomeSeqContainer Status: buf:(5:64160000-64448000) n=288, MaxSoFar=3118] ..Switching to Chromosome: 6 [2023-02-16 14:54:54] ... Skipping chrom "5" in genome fasta... found chrom 6 [2023-02-16 14:54:54] ........[63000000 Read-Pairs processed] [Time: 2023-02-16 14:57:44] [GenomeSeqContainer Status: buf:(6:9770000-9961000) n=191, MaxSoFar=3118] ..........[64000000 Read-Pairs processed] [Time: 2023-02-16 15:01:36] [GenomeSeqContainer Status: buf:(6:30857000-31275000) n=418, MaxSoFar=3118] ..........[65000000 Read-Pairs processed] [Time: 2023-02-16 15:05:28] [GenomeSeqContainer Status: buf:(6:52225000-52533000) n=308, MaxSoFar=3118] ...Switching to Chromosome: 7 [2023-02-16 15:06:47] ... Skipping chrom "6" in genome fasta... found chrom 7 [2023-02-16 15:06:47] .......[66000000 Read-Pairs processed] [Time: 2023-02-16 15:09:16] [GenomeSeqContainer Status: buf:(7:21487000-21862000) n=375, MaxSoFar=3118] ..........[67000000 Read-Pairs processed] [Time: 2023-02-16 15:13:10] [GenomeSeqContainer Status: buf:(7:37461000-38022000) n=561, MaxSoFar=3118] ..........[68000000 Read-Pairs processed] [Time: 2023-02-16 15:17:13] [GenomeSeqContainer Status: buf:(7:47164000-47369000) n=205, MaxSoFar=3118] ..........[69000000 Read-Pairs processed] [Time: 2023-02-16 15:21:14] [GenomeSeqContainer Status: buf:(7:63686000-64101000) n=415, MaxSoFar=3118] ....Switching to Chromosome: 8 [2023-02-16 15:22:48] ... Skipping chrom "7" in genome fasta... found chrom 8 [2023-02-16 15:22:48] ......[70000000 Read-Pairs processed] [Time: 2023-02-16 15:25:02] [GenomeSeqContainer Status: buf:(8:15452000-15916000) n=464, MaxSoFar=3118] ..........[71000000 Read-Pairs processed] [Time: 2023-02-16 15:28:51] [GenomeSeqContainer Status: buf:(8:39721000-40159000) n=438, MaxSoFar=3118] .......Switching to Chromosome: 9 [2023-02-16 15:31:48] ... Skipping chrom "8" in genome fasta... found chrom 9 [2023-02-16 15:31:48] ...[72000000 Read-Pairs processed] [Time: 2023-02-16 15:32:40] ..........[73000000 Read-Pairs processed] [Time: 2023-02-16 15:36:23] [GenomeSeqContainer Status: buf:(9:30925000-31051000) n=126, MaxSoFar=3118] ..........[74000000 Read-Pairs processed] [Time: 2023-02-16 15:40:20] [GenomeSeqContainer Status: buf:(9:54602000-55141000) n=539, MaxSoFar=3118] ..Switching to Chromosome: MT [2023-02-16 15:41:14] ... Skipping chrom "9" in genome fasta... found chrom MT [2023-02-16 15:41:14] NOTE: Unmatched Read Buffer Size > 1600000 [Mem usage:[112GB / 120GB]] NOTE: Unmatched Read Buffer Size > 3200000 [Mem usage:[14GB / 144GB]] NOTE: Unmatched Read Buffer Size > 6400000 [Mem usage:[29GB / 144GB]] NOTE: Unmatched Read Buffer Size > 12800000 [Mem usage:[54GB / 144GB]] NOTE: Unsorted Read-PAIR-Buffer Size > 3200000 [Mem usage:[90GB / 144GB]] Currently searching for read: A01901:60:H37HJDRX2:2:2122:16514:22482 for 3145621 iterations. Searching for read: A01901:60:H37HJDRX2:2:2122:16514:22482 MT:264-273 99 Current unmatched-pair-buffer status: 16488350 NOTE: Unsorted Read-PAIR-Buffer Size > 6400000 [Mem usage:[107GB / 144GB]] Currently searching for read: A01901:60:H37HJDRX2:2:2122:16514:22482 for 6345621 iterations. Searching for read: A01901:60:H37HJDRX2:2:2122:16514:22482 MT:264-273 99 Current unmatched-pair-buffer status: 14583726 NOTE: Unsorted Read-PAIR-Buffer Size > 12800000 [Mem usage:[102GB / 241GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2165:21938:9706 for 4305562 iterations. Searching for read: A01901:60:H37HJDRX2:1:2165:21938:9706 MT:304-9811 99 Current unmatched-pair-buffer status: 10315126 NOTE: Unsorted Read-PAIR-Buffer Size > 25600000 [Mem usage:[168GB / 241GB]] Currently searching for read: A01901:60:H37HJDRX2:1:2165:21938:9706 for 17105562 iterations. Searching for read: A01901:60:H37HJDRX2:1:2165:21938:9706 MT:304-9811 99 Current unmatched-pair-buffer status: 953517 ........[75000000 Read-Pairs processed] [Time: 2023-02-16 16:33:07] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[76000000 Read-Pairs processed] [Time: 2023-02-16 16:36:42] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[77000000 Read-Pairs processed] [Time: 2023-02-16 16:40:28] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[78000000 Read-Pairs processed] [Time: 2023-02-16 16:44:30] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[79000000 Read-Pairs processed] [Time: 2023-02-16 16:48:28] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[80000000 Read-Pairs processed] [Time: 2023-02-16 16:52:01] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[81000000 Read-Pairs processed] [Time: 2023-02-16 16:55:51] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[82000000 Read-Pairs processed] [Time: 2023-02-16 16:59:38] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[83000000 Read-Pairs processed] [Time: 2023-02-16 17:02:51] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[84000000 Read-Pairs processed] [Time: 2023-02-16 17:06:19] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[85000000 Read-Pairs processed] [Time: 2023-02-16 17:09:45] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[86000000 Read-Pairs processed] [Time: 2023-02-16 17:12:50] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[87000000 Read-Pairs processed] [Time: 2023-02-16 17:16:03] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[88000000 Read-Pairs processed] [Time: 2023-02-16 17:19:28] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[89000000 Read-Pairs processed] [Time: 2023-02-16 17:22:43] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[90000000 Read-Pairs processed] [Time: 2023-02-16 17:26:13] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[91000000 Read-Pairs processed] [Time: 2023-02-16 17:29:43] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[92000000 Read-Pairs processed] [Time: 2023-02-16 17:33:28] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[93000000 Read-Pairs processed] [Time: 2023-02-16 17:36:38] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[94000000 Read-Pairs processed] [Time: 2023-02-16 17:40:01] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[95000000 Read-Pairs processed] [Time: 2023-02-16 17:43:42] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[96000000 Read-Pairs processed] [Time: 2023-02-16 17:47:15] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[97000000 Read-Pairs processed] [Time: 2023-02-16 17:50:49] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[98000000 Read-Pairs processed] [Time: 2023-02-16 17:54:29] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[99000000 Read-Pairs processed] [Time: 2023-02-16 17:58:11] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[100000000 Read-Pairs processed] [Time: 2023-02-16 18:02:05] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[101000000 Read-Pairs processed] [Time: 2023-02-16 18:05:32] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[102000000 Read-Pairs processed] [Time: 2023-02-16 18:09:00] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[103000000 Read-Pairs processed] [Time: 2023-02-16 18:12:20] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[104000000 Read-Pairs processed] [Time: 2023-02-16 18:15:41] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[105000000 Read-Pairs processed] [Time: 2023-02-16 18:19:18] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[106000000 Read-Pairs processed] [Time: 2023-02-16 18:22:51] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[107000000 Read-Pairs processed] [Time: 2023-02-16 18:26:28] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[108000000 Read-Pairs processed] [Time: 2023-02-16 18:29:51] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[109000000 Read-Pairs processed] [Time: 2023-02-16 18:33:04] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[110000000 Read-Pairs processed] [Time: 2023-02-16 18:36:24] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[111000000 Read-Pairs processed] [Time: 2023-02-16 18:39:50] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[112000000 Read-Pairs processed] [Time: 2023-02-16 18:43:33] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[113000000 Read-Pairs processed] [Time: 2023-02-16 18:46:49] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[114000000 Read-Pairs processed] [Time: 2023-02-16 18:50:18] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[115000000 Read-Pairs processed] [Time: 2023-02-16 18:53:55] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[116000000 Read-Pairs processed] [Time: 2023-02-16 18:57:02] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[117000000 Read-Pairs processed] [Time: 2023-02-16 19:00:49] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[118000000 Read-Pairs processed] [Time: 2023-02-16 19:04:08] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[119000000 Read-Pairs processed] [Time: 2023-02-16 19:07:09] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[120000000 Read-Pairs processed] [Time: 2023-02-16 19:10:30] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[121000000 Read-Pairs processed] [Time: 2023-02-16 19:13:32] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[122000000 Read-Pairs processed] [Time: 2023-02-16 19:16:52] [GenomeSeqContainer Status: buf:(MT:0-17000) n=17, MaxSoFar=3118] ..........[123000000 Read-Pairs processed] [Time: 2023-02-16 19:20:14] [GenomeSeqContainer Status: buf:(MT:2000-17000) n=15, MaxSoFar=3118] ..........[124000000 Read-Pairs processed] [Time: 2023-02-16 19:24:17] [GenomeSeqContainer Status: buf:(MT:5000-17000) n=12, MaxSoFar=3118] ..........[125000000 Read-Pairs processed] [Time: 2023-02-16 19:27:49] [GenomeSeqContainer Status: buf:(MT:5000-17000) n=12, MaxSoFar=3118] ..........[126000000 Read-Pairs processed] [Time: 2023-02-16 19:31:41] [GenomeSeqContainer Status: buf:(MT:5000-17000) n=12, MaxSoFar=3118] ..........[127000000 Read-Pairs processed] [Time: 2023-02-16 19:35:36] [GenomeSeqContainer Status: buf:(MT:6000-17000) n=11, MaxSoFar=3118] .Switching to Chromosome: EGFP [2023-02-16 19:36:19] ... Skipping chrom "MT" in genome fasta... found chrom EGFP [2023-02-16 19:36:20] Switching to Chromosome: GAL4FF [2023-02-16 19:36:20] ... Skipping chrom "EGFP" in genome fasta... found chrom GAL4FF [2023-02-16 19:36:20] Finished reading SAM. Read: 127145524 reads/read-pairs. Finished reading SAM. Used: 124692211 reads/read-pairs. [Time: 2023-02-16 19:40:19] [Mem usage: [346GB / 447GB]] [Elapsed Time: 08:42:12.0840] > Read Stats: > READ_PAIR_OK 124692211 > TOTAL_READ_PAIRS 127145524 > DROPPED_NOT_PROPER_PAIR 0 > DROPPED_READ_FAILS_VENDOR_QC 0 > DROPPED_MARKED_NOT_VALID 0 > DROPPED_CHROMS_MISMATCH 0 > DROPPED_PAIR_STRANDS_MISMATCH 0 > DROPPED_IGNORED_CHROMOSOME 0 > DROPPED_NOT_UNIQUE_ALIGNMENT 2453313 > DROPPED_NO_ALN_BLOCKS 0 > DROPPED_NOT_MARKED_RG -1 Pre-alignment read count unknown (Set --seqReadCt or --rawfastq) Writing Output... DEBUG NOTE: IncludeGenesSet.size: 27044 DEBUG NOTE: sortedReadCountSeq.size: 18899 DEBUG NOTE: coverageThresholds: 9449;14174;17009;18899 DEBUG NOTE: coverageSpans: [(0,9449);(9449,14174);(14174,17009);(17009,18899)] DEBUG NOTE: [1.bottomHalf][0.5] = [0,9449] DEBUG NOTE: [2.upperMidQuartile][0.75] = [9449,14174] DEBUG NOTE: [3.75to90][0.9] = [14174,17009] DEBUG NOTE: [4.high][1.0] = [17009,18899] (DEBUG) Generating Biotype Map [2023-02-16 19:40:29] (DEBUG) Extracted gene BioType using key "gene_biotype". Found 34 types: [TR_V_gene,unprocessed_pseudogene,protein_coding,IG_V_gene,TR_J_gene,Mt_tRNA,rRNA,TEC,miRNA,scaRNA,TR_D_gene,snRNA,TR_V_pseudogene,snoRNA,IG_J_pseudogene,processed_transcript,IG_V_pseudogene,IG_J_gene,processed_pseudogene,IG_C_pseudogene,sense_overlapping,transcribed_unprocessed_pseudogene,lincRNA,IG_C_gene,misc_RNA,ribozyme,polymorphic_pseudogene,User,antisense,Mt_rRNA,pseudogene,sRNA,IG_pseudogene,sense_intronic] (DEBUG) Finished Biotype Map [2023-02-16 19:42:20] length of knownCountMap after run: 256778 WARNING: QoRTs is unable to infer the strandedness from the data! This isn't a problem per-se, since QoRTs requires that strandedness mode be set manually. However, it might be indicative that something is very wrong with your dataset and/or transcript annotation. QoRTs completed WITH WARNINGS! See log for details. Done. Time spent on setup: 00:01:43.0189 Time spent on SAM iteration: 08:40:29.0662 (4.093603274098217 minutes per million read-pairs) (4.174144713283923 minutes per million read-pairs used) Time spent on file output: 00:02:23.0395 Total runtime: 08:44:36.0246 Done. (Thu Feb 16 19:42:42 CET 2023) End of Script. Script took 31541 seconds. ```

multiplot-sample-sub

QoRTs multiplot on this sample. Many of the plots are empty.

hartleys commented 1 year ago

Hmm. Can you give me an ls of the output dir?

And then check inside one of the files, say "QC.insert.size.txt.gz"?

How did you do the downsampling? It may have had problems if the majority of the reads did not have matched pairs.

Also: can you post the log?

royfrancis commented 1 year ago
Output file list ``` QC.biotypeCounts.txt QC.chromCount.txt QC.cigarLoci.deletionCounts.all.txt QC.cigarLoci.deletionCounts.highCoverage.txt QC.cigarLoci.insertionCounts.all.txt QC.cigarLoci.insertionCounts.highCoverage.txt QC.cigarOpDistribution.byReadCycle.R1.txt QC.cigarOpDistribution.byReadCycle.R2.txt QC.cigarOpLengths.byOp.R1.txt QC.cigarOpLengths.byOp.R2.txt QC.exonCounts.formatted.for.DEXSeq.txt QC.FTnRrt5rbVMr.log QC.gc.byPair.txt QC.gc.byRead.txt QC.gc.byRead.vsBaseCt.txt QC.gc.R1.txt QC.gc.R2.txt QC.geneBodyCoverage.byExpr.avgPct.txt QC.geneBodyCoverage.by.expression.level.txt QC.geneBodyCoverage.genewise.txt QC.geneCounts.formatted.for.DESeq.txt QC.geneCounts.txt QC.insert.size.byReadLen.txt QC.insert.size.debug.dropped.txt QC.insert.size.debug.txt QC.insert.size.txt QC.mismatchSizeRates.txt QC.mismatchSummary.txt QC.NVC.lead.clip.R1.txt QC.NVC.lead.clip.R2.txt QC.NVC.minus.clipping.R1.txt QC.NVC.minus.clipping.R2.txt QC.NVC.raw.R1.txt QC.NVC.raw.R2.txt QC.NVC.tail.clip.R1.txt QC.NVC.tail.clip.R2.txt QC.orderedChromList.txt QC.overlapCoverage.txt QC.overlapMismatch.byBase.txt QC.overlapMismatch.byRead.txt QC.overlapMismatch.byScoreAndBP.txt QC.overlapMismatch.byScore.txt QC.overlapMismatch.txt QC.QORTS_COMPLETED_OK QC.QORTS_COMPLETED_WARN QC.QORTS_RUNNING QC.quals.r1.txt QC.quals.r2.txt QC.readLenDist.txt QC.referenceMismatch.byScoreAndBP.txt QC.referenceMismatch.byScore.txt QC.referenceMismatchCounts.txt QC.referenceMismatchRaw.byReadStrand.txt QC.spliceJunctionAndExonCounts.forJunctionSeq.txt QC.spliceJunctionCounts.knownSplices.txt QC.spliceJunctionCounts.novelSplices.txt QC.summary.txt QC.yX9gr2Yu8Jsk.log ```

WARN file says this:

# Note: if this file EXISTS, then QoRTs QC completed WITH WARNINGS. Warning messages follow:
Warning: run-in-progress file "sample-sub-qorts/QC.QORTS_RUNNING" already exists. Is there another QoRTs job running?
WARNING: QoRTs is unable to infer the strandedness from the data!
         This isn't a problem per-se, since QoRTs requires that strandedness
         mode be set manually. However, it might be indicative that something
         is very wrong with your dataset and/or transcript annotation.
QoRTs completed WITH WARNINGS! See log for details.

Contents of QC.insert.size.txt

$ head sample-sub-qorts/QC.insert.size.txt 
InsertSize  Ct
0   0
1   0
2   0
3   0
4   0
5   0
6   0
7   0
8   0

$ tail sample-sub-qorts/QC.insert.size.txt 
980064  1
1035994 1
1155833 1
1155846 3
1321162 1
1321165 2
1321172 2
1321176 1
1321183 1
1822321 1

Subsampling was done as such:

module load samtools/1.3
samtools view -b -s 0.6 sample.bam > sample-sub.bam

I have two other samples (BAM files) which ran fine without downsampling or memory issues. They also had about 20% fewer reads. But, they also produced the warning about strand and many blank plots in the multi plots. So I am not sure if downsampling is the reason for this. It could be one of the many other issues that you mentioned.

Here is the plotting script and log.

Plot log ``` library(QoRTs) res <- read.qc.results.data(infile.dir="data/raw/zumis/qorts/", decoder.files = "data/raw/zumis/qorts/decoder.txt",autodetectMissingSamples=TRUE) column 'qc.data.prefix' not found in the decoder, assuming qc.data.prefix = "" Note: no input.read.pair.count column found. This column is optional, but without it mapping rates cannot be calculated. Note: no multi.mapped.read.pair.count column found. This column is optional, but without it (depending on how your aligner implements multi-mapping) multi-mapping rates might not be plotted. infile.dir = data/raw/zumis/qorts/ scalaqc_file = QC.summary.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Autodetected Paired-End mode. (File 1 of 43): QC.gc.byPair.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 2 of 43): QC.gc.byRead.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 3 of 43): QC.gc.byRead.vsBaseCt.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 4 of 43): QC.quals.r1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 5 of 43): QC.quals.r2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 6 of 43): QC.cigarOpDistribution.byReadCycle.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 7 of 43): QC.cigarOpDistribution.byReadCycle.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 8 of 43): QC.cigarOpLengths.byOp.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.02 secs] (File 9 of 43): QC.cigarOpLengths.byOp.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.02 secs] (File 10 of 43): QC.geneBodyCoverage.by.expression.level.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 11 of 43): QC.geneCounts.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.04 secs] (File 12 of 43): QC.insert.size.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.05 secs] (File 13 of 43): QC.NVC.raw.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 14 of 43): QC.NVC.raw.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 15 of 43): QC.NVC.lead.clip.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.02 secs] (File 16 of 43): QC.NVC.lead.clip.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.04 secs] (File 17 of 43): QC.NVC.tail.clip.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] (File 18 of 43): QC.NVC.tail.clip.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] (File 19 of 43): QC.NVC.minus.clipping.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 20 of 43): QC.NVC.minus.clipping.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 21 of 43): QC.chromCount.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 22 of 43): QC.biotypeCounts.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 23 of 43): QC.geneBodyCoverage.byExpr.avgPct.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 24 of 43): QC.overlapCoverage.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 25 of 43): QC.overlapMismatch.byRead.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 26 of 43): QC.overlapMismatch.byScore.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 27 of 43): QC.overlapMismatch.byBase.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 28 of 43): QC.overlapMismatch.byScoreAndBP.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] (File 29 of 43): QC.readLenDist.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 30 of 43): QC.referenceMismatchCounts.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 31 of 43): QC.referenceMismatchRaw.byReadStrand.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] (File 32 of 43): QC.referenceMismatch.byScore.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 33 of 43): QC.referenceMismatch.byScoreAndBP.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 34 of 43): QC.mismatchSizeRates.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] (File 35 of 43): QC.FQ.gc.byRead.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.byRead.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 36 of 43): QC.FQ.gc.byPair.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.byPair.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 37 of 43): QC.FQ.gc.R1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.R1.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 38 of 43): QC.FQ.gc.R2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.R2.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 39 of 43): QC.FQ.NVC.R1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.NVC.R1.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 40 of 43): QC.FQ.NVC.R2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.NVC.R2.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 41 of 43): QC.FQ.quals.r1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.quals.r1.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 42 of 43): QC.FQ.quals.r2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.quals.r2.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 43 of 43): QC.FQ.readLenDist.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.readLenDist.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating secondary data: Calculating Quality Score Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating cumulative gene coverage, by replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating cumulative gene coverage, by sample...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating Mapping Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating normalization factors, by sample...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating normalization factors, by replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating normalization factors, by sample/replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating summary stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating overlap mismatch-size rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating cumulative overlap mismatch-size rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] Calculating overlap coverage Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap coverage Rates By Read...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating read length distribution...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by AVG score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by MIN score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Adding Min score error to summary tables...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by R1 score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by R2 score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating referenceMismatchCounts stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating referenceMismatch.byScore stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating referenceMismatchRaw.byReadStrand stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating referenceMismatch.byScoreAndBP stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating summary table...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap mismatch combos...Calculating mismatch combo rates:...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlapMismatch.byScoreAndBP stats...done. [time: 2023-02-17 11:27:10],[elapsed: 0.55 secs] done. [time: 2023-02-17 11:27:10],[elapsed: 0.56 secs] Calculating NVC rates...done. [time: 2023-02-17 11:27:10],[elapsed: 0.05 secs] done. [time: 2023-02-17 11:27:10],[elapsed: 0.69 secs] Skipping: "onTarget.rates","onTarget.counts","overlap.mismatch.byAvgQual" Rasterize large plots: FALSE Rasterize medium plots: FALSE Skipping due to missing data: "mapping.rates","norm.factors","norm.vs.TC" Plotting to the currently-open device... Plotting extended... Starting compiled plot... null device 1 ```
hartleys commented 1 year ago

For the insert size, could you show the first 500 lines?

And could you maybe post the full QC.quals.r1.txt file? That one perplexes me the most since none of this other stuff should affect it, it's dead simple.

What version of R are you running? I haven't tested qorts on the newer versions. It shouldn't make a difference but it's possible.

On Fri, Feb 17, 2023, 10:41 AM Roy Francis @.***> wrote:

Output file list

QC.biotypeCounts.txt QC.chromCount.txt QC.cigarLoci.deletionCounts.all.txt QC.cigarLoci.deletionCounts.highCoverage.txt QC.cigarLoci.insertionCounts.all.txt QC.cigarLoci.insertionCounts.highCoverage.txt QC.cigarOpDistribution.byReadCycle.R1.txt QC.cigarOpDistribution.byReadCycle.R2.txt QC.cigarOpLengths.byOp.R1.txt QC.cigarOpLengths.byOp.R2.txt QC.exonCounts.formatted.for.DEXSeq.txt QC.FTnRrt5rbVMr.log QC.gc.byPair.txt QC.gc.byRead.txt QC.gc.byRead.vsBaseCt.txt QC.gc.R1.txt QC.gc.R2.txt QC.geneBodyCoverage.byExpr.avgPct.txt QC.geneBodyCoverage.by.expression.level.txt QC.geneBodyCoverage.genewise.txt QC.geneCounts.formatted.for.DESeq.txt QC.geneCounts.txt QC.insert.size.byReadLen.txt QC.insert.size.debug.dropped.txt QC.insert.size.debug.txt QC.insert.size.txt QC.mismatchSizeRates.txt QC.mismatchSummary.txt QC.NVC.lead.clip.R1.txt QC.NVC.lead.clip.R2.txt QC.NVC.minus.clipping.R1.txt QC.NVC.minus.clipping.R2.txt QC.NVC.raw.R1.txt QC.NVC.raw.R2.txt QC.NVC.tail.clip.R1.txt QC.NVC.tail.clip.R2.txt QC.orderedChromList.txt QC.overlapCoverage.txt QC.overlapMismatch.byBase.txt QC.overlapMismatch.byRead.txt QC.overlapMismatch.byScoreAndBP.txt QC.overlapMismatch.byScore.txt QC.overlapMismatch.txt QC.QORTS_COMPLETED_OK QC.QORTS_COMPLETED_WARN QC.QORTS_RUNNING QC.quals.r1.txt QC.quals.r2.txt QC.readLenDist.txt QC.referenceMismatch.byScoreAndBP.txt QC.referenceMismatch.byScore.txt QC.referenceMismatchCounts.txt QC.referenceMismatchRaw.byReadStrand.txt QC.spliceJunctionAndExonCounts.forJunctionSeq.txt QC.spliceJunctionCounts.knownSplices.txt QC.spliceJunctionCounts.novelSplices.txt QC.summary.txt QC.yX9gr2Yu8Jsk.log

Contents of QC.insert.size.txt

$ head sample-sub-qorts/QC.insert.size.txt InsertSize Ct 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0

$ tail sample-sub-qorts/QC.insert.size.txt 980064 1 1035994 1 1155833 1 1155846 3 1321162 1 1321165 2 1321172 2 1321176 1 1321183 1 1822321 1

Subsampling was done as such:

module load samtools/1.3 samtools view -b -s 0.6 sample.bam > sample-sub.bam

I have two other samples (BAM files) which ran fine without downsampling or memory issues. They also had about 20% fewer reads. They also produced several blank plots in the multi plots. So I am not sure if downsampling is the reason for this.

Here is the plotting script and log. Plot log

library(QoRTs) res <- read.qc.results.data(infile.dir="data/raw/zumis/qorts/", decoder.files = "data/raw/zumis/qorts/decoder.txt",autodetectMissingSamples=TRUE)

column 'qc.data.prefix' not found in the decoder, assuming qc.data.prefix = "" Note: no input.read.pair.count column found. This column is optional, but without it mapping rates cannot be calculated. Note: no multi.mapped.read.pair.count column found. This column is optional, but without it (depending on how your aligner implements multi-mapping) multi-mapping rates might not be plotted. infile.dir = data/raw/zumis/qorts/ scalaqc_file = QC.summary.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Autodetected Paired-End mode. (File 1 of 43): QC.gc.byPair.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 2 of 43): QC.gc.byRead.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 3 of 43): QC.gc.byRead.vsBaseCt.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 4 of 43): QC.quals.r1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 5 of 43): QC.quals.r2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 6 of 43): QC.cigarOpDistribution.byReadCycle.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 7 of 43): QC.cigarOpDistribution.byReadCycle.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 8 of 43): QC.cigarOpLengths.byOp.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.02 secs] (File 9 of 43): QC.cigarOpLengths.byOp.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.02 secs] (File 10 of 43): QC.geneBodyCoverage.by.expression.level.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 11 of 43): QC.geneCounts.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.04 secs] (File 12 of 43): QC.insert.size.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.05 secs] (File 13 of 43): QC.NVC.raw.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 14 of 43): QC.NVC.raw.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 15 of 43): QC.NVC.lead.clip.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.02 secs] (File 16 of 43): QC.NVC.lead.clip.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.04 secs] (File 17 of 43): QC.NVC.tail.clip.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] (File 18 of 43): QC.NVC.tail.clip.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] (File 19 of 43): QC.NVC.minus.clipping.R1.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 20 of 43): QC.NVC.minus.clipping.R2.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 21 of 43): QC.chromCount.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 22 of 43): QC.biotypeCounts.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 23 of 43): QC.geneBodyCoverage.byExpr.avgPct.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 24 of 43): QC.overlapCoverage.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 25 of 43): QC.overlapMismatch.byRead.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 26 of 43): QC.overlapMismatch.byScore.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 27 of 43): QC.overlapMismatch.byBase.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 28 of 43): QC.overlapMismatch.byScoreAndBP.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] (File 29 of 43): QC.readLenDist.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 30 of 43): QC.referenceMismatchCounts.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 31 of 43): QC.referenceMismatchRaw.byReadStrand.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] (File 32 of 43): QC.referenceMismatch.byScore.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 33 of 43): QC.referenceMismatch.byScoreAndBP.txt.done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 34 of 43): QC.mismatchSizeRates.txt.gz.done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] (File 35 of 43): QC.FQ.gc.byRead.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.byRead.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 36 of 43): QC.FQ.gc.byPair.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.byPair.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 37 of 43): QC.FQ.gc.R1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.R1.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 38 of 43): QC.FQ.gc.R2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.R2.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 39 of 43): QC.FQ.NVC.R1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.NVC.R1.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 40 of 43): QC.FQ.NVC.R2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.NVC.R2.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 41 of 43): QC.FQ.quals.r1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.quals.r1.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 42 of 43): QC.FQ.quals.r2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.quals.r2.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] (File 43 of 43): QC.FQ.readLenDist.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.readLenDist.txt.gz. Skipping tests that use this data. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating secondary data: Calculating Quality Score Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating cumulative gene coverage, by replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating cumulative gene coverage, by sample...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating Mapping Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating normalization factors, by sample...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating normalization factors, by replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] calculating normalization factors, by sample/replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating summary stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating overlap mismatch-size rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating cumulative overlap mismatch-size rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs] Calculating overlap coverage Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap coverage Rates By Read...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating read length distribution...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by AVG score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by MIN score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Adding Min score error to summary tables...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by R1 score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap by R2 score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating referenceMismatchCounts stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating referenceMismatch.byScore stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating referenceMismatchRaw.byReadStrand stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating referenceMismatch.byScoreAndBP stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs] Calculating summary table...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlap mismatch combos...Calculating mismatch combo rates:...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs] Calculating overlapMismatch.byScoreAndBP stats...done. [time: 2023-02-17 11:27:10],[elapsed: 0.55 secs] done. [time: 2023-02-17 11:27:10],[elapsed: 0.56 secs] Calculating NVC rates...done. [time: 2023-02-17 11:27:10],[elapsed: 0.05 secs] done. [time: 2023-02-17 11:27:10],[elapsed: 0.69 secs] Skipping: "onTarget.rates","onTarget.counts","overlap.mismatch.byAvgQual" Rasterize large plots: FALSE Rasterize medium plots: FALSE Skipping due to missing data: "mapping.rates","norm.factors","norm.vs.TC" Plotting to the currently-open device... Plotting extended... Starting compiled plot... null device 1

— Reply to this email directly, view it on GitHub https://github.com/hartleys/QoRTs/issues/89#issuecomment-1434822547, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWC53H52RP63GLRGUGVS4DWX6L2BANCNFSM6AAAAAAU4SENCU . You are receiving this because you commented.Message ID: @.***>

royfrancis commented 1 year ago

500 lines of insert size

Insert Size ``` $ head -500 QC.insert.size.txt InsertSize Ct 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 0 12 0 13 0 14 0 15 0 16 1 17 9 18 9 19 3 20 2 21 2 22 1 23 16 24 14 25 27 26 50 27 53 28 66 29 172 30 81 31 103 32 293 33 103 34 160 35 116 36 106 37 160 38 147 39 197 40 217 41 105 42 117 43 159 44 152 45 204 46 172 47 258 48 265 49 228 50 326 51 189 52 239 53 173 54 289 55 271 56 570 57 1240 58 59640 59 87598 60 157234 61 66937 62 71341 63 53061 64 27000 65 18377 66 19845 67 19960 68 18363 69 18983 70 23432 71 21124 72 20022 73 21429 74 18033 75 17383 76 18315 77 18698 78 20664 79 24500 80 68901 81 60539 82 64878 83 69174 84 66908 85 59720 86 56638 87 67972 88 63187 89 64589 90 64062 91 63578 92 64307 93 69494 94 63117 95 65724 96 62436 97 67277 98 63380 99 66513 100 66466 101 71515 102 69962 103 71410 104 73335 105 78041 106 76025 107 69028 108 69512 109 70052 110 70529 111 79839 112 72271 113 124319 114 122282 115 122449 116 124412 117 142020 118 166582 119 239607 120 372431 121 505082 122 240310 123 226321 124 233692 125 265236 126 475243 127 240638 128 263637 129 241704 130 279017 131 339567 132 256077 133 653230 134 268081 135 252661 136 384324 137 307867 138 352451 139 297923 140 398459 141 264757 142 268405 143 315080 144 288190 145 337896 146 276336 147 277009 148 462523 149 271486 150 295669 151 321542 152 283935 153 326496 154 288825 155 336537 156 281186 157 307664 158 312668 159 426211 160 305038 161 388685 162 323370 163 297768 164 316399 165 358222 166 382747 167 412732 168 338350 169 424533 170 374563 171 362098 172 344140 173 348566 174 552290 175 394478 176 321790 177 348267 178 374816 179 327981 180 367405 181 381411 182 441740 183 468710 184 368974 185 358864 186 379367 187 390232 188 452099 189 387880 190 436000 191 369812 192 378372 193 412208 194 450890 195 386683 196 373977 197 533712 198 405176 199 404649 200 406269 201 378857 202 425324 203 395905 204 597632 205 1301920 206 486930 207 447109 208 402321 209 387888 210 401512 211 781089 212 693961 213 466506 214 413397 215 437600 216 421373 217 520441 218 445268 219 532208 220 456514 221 433206 222 416687 223 488708 224 429191 225 434593 226 460544 227 430857 228 450473 229 436757 230 431798 231 486690 232 776357 233 422793 234 409099 235 416835 236 426599 237 715289 238 488193 239 442650 240 585901 241 435689 242 444152 243 436647 244 428768 245 1148477 246 399705 247 434309 248 455262 249 479593 250 436653 251 477488 252 528618 253 704487 254 479866 255 427654 256 412713 257 428468 258 430080 259 562194 260 464509 261 703258 262 418314 263 433300 264 373629 265 374279 266 404222 267 658972 268 372415 269 374427 270 370258 271 372728 272 389580 273 354233 274 404060 275 366348 276 425936 277 369730 278 353342 279 494877 280 344769 281 483421 282 354558 283 334601 284 442987 285 353446 286 336950 287 343088 288 368300 289 535209 290 371898 291 351777 292 345007 293 308206 294 393970 295 708242 296 346534 297 328592 298 296338 299 346341 300 295271 301 266166 302 312359 303 292345 304 357556 305 391358 306 323948 307 751195 308 293197 309 651166 310 299225 311 279659 312 339363 313 283095 314 620971 315 334242 316 257542 317 262985 318 260833 319 426052 320 308990 321 291909 322 246837 323 272518 324 233598 325 402850 326 260351 327 236414 328 237087 329 239225 330 262224 331 236702 332 265793 333 215020 334 271660 335 226407 336 414604 337 235193 338 501403 339 269455 340 220418 341 217617 342 215120 343 258349 344 231492 345 415804 346 200754 347 217006 348 258059 349 194916 350 259488 351 172721 352 193666 353 195597 354 185406 355 254261 356 168437 357 172563 358 165991 359 176199 360 442129 361 164411 362 159551 363 154943 364 167164 365 235471 366 167066 367 242742 368 266777 369 239879 370 237212 371 145163 372 143892 373 141814 374 227232 375 169118 376 167336 377 292432 378 178525 379 148721 380 139105 381 164141 382 151485 383 131033 384 119986 385 144885 386 121424 387 140399 388 277183 389 143188 390 163776 391 127696 392 130411 393 111648 394 166541 395 113544 396 131896 397 106142 398 107851 399 106086 400 119453 401 127097 402 122592 403 125669 404 104015 405 97465 406 96807 407 96373 408 98182 409 101707 410 103714 411 97016 412 94409 413 211853 414 109362 415 97031 416 97236 417 84272 418 83237 419 97841 420 97590 421 119733 422 89049 423 81981 424 84109 425 92483 426 156811 427 89280 428 84574 429 83699 430 111149 431 95117 432 77905 433 88704 434 83277 435 92809 436 112704 437 69045 438 66494 439 68579 440 72822 441 78703 442 68204 443 62768 444 66443 445 69627 446 65667 447 64578 448 64686 449 63022 450 71708 451 62043 452 60983 453 57432 454 56493 455 68109 456 55479 457 58225 458 54039 459 57088 460 66978 461 60150 462 49749 463 52728 464 50713 465 52951 466 56714 467 52996 468 48991 469 53235 470 51141 471 50091 472 50479 473 46312 474 51362 475 44767 476 44368 477 51617 478 50031 479 46308 480 48184 481 47648 482 44881 483 50809 484 52606 485 43196 486 44873 487 40805 488 44500 489 49127 490 40096 491 65080 492 40458 493 44779 494 44516 495 40870 496 49055 497 36519 498 36280 ```
QC.quals.r1 ``` $ cat QC.quals.r1.txt readLen min lowerQuartile median upperQuartile max 0 3 38 38 38 38 1 3 38 38 38 38 2 3 38 38 38 38 3 3 38 38 38 38 4 12 38 38 38 38 5 3 38 38 38 38 6 12 38 38 38 38 7 3 38 38 38 38 8 3 38 38 38 38 9 12 38 38 38 38 10 12 38 38 38 38 11 3 38 38 38 38 12 3 38 38 38 38 13 12 38 38 38 38 14 3 38 38 38 38 15 12 38 38 38 38 16 3 38 38 38 38 17 12 38 38 38 38 18 3 38 38 38 38 19 3 38 38 38 38 20 3 38 38 38 38 21 3 38 38 38 38 22 3 38 38 38 38 23 3 38 38 38 38 24 3 38 38 38 38 25 3 38 38 38 38 26 12 38 38 38 38 27 12 38 38 38 38 28 12 38 38 38 38 29 3 38 38 38 38 30 3 38 38 38 38 31 3 38 38 38 38 32 12 38 38 38 38 33 3 38 38 38 38 34 3 38 38 38 38 35 3 38 38 38 38 36 3 38 38 38 38 37 3 38 38 38 38 38 3 38 38 38 38 39 3 38 38 38 38 40 3 38 38 38 38 41 3 38 38 38 38 42 3 38 38 38 38 43 12 38 38 38 38 44 3 38 38 38 38 45 3 38 38 38 38 46 3 38 38 38 38 47 3 38 38 38 38 48 3 38 38 38 38 49 12 38 38 38 38 50 3 38 38 38 38 51 3 38 38 38 38 52 3 38 38 38 38 53 3 38 38 38 38 54 12 38 38 38 38 55 12 38 38 38 38 56 3 38 38 38 38 57 3 38 38 38 38 58 3 38 38 38 38 59 3 38 38 38 38 60 3 38 38 38 38 61 3 38 38 38 38 62 12 38 38 38 38 63 12 38 38 38 38 64 3 38 38 38 38 65 12 38 38 38 38 66 12 38 38 38 38 67 3 38 38 38 38 68 3 38 38 38 38 69 12 38 38 38 38 70 3 38 38 38 38 71 12 38 38 38 38 72 3 38 38 38 38 73 3 38 38 38 38 74 3 38 38 38 38 75 3 38 38 38 38 76 12 38 38 38 38 77 12 38 38 38 38 78 3 38 38 38 38 79 3 38 38 38 38 80 3 38 38 38 38 81 3 38 38 38 38 82 3 38 38 38 38 83 3 38 38 38 38 84 12 38 38 38 38 85 -1 0 0 0 0 86 -1 0 0 0 0 87 -1 0 0 0 0 88 -1 0 0 0 0 89 -1 0 0 0 0 90 -1 0 0 0 0 91 -1 0 0 0 0 92 -1 0 0 0 0 93 -1 0 0 0 0 94 -1 0 0 0 0 95 -1 0 0 0 0 96 -1 0 0 0 0 97 -1 0 0 0 0 98 -1 0 0 0 0 99 -1 0 0 0 0 100 -1 0 0 0 0 101 -1 0 0 0 0 102 -1 0 0 0 0 103 -1 0 0 0 0 104 -1 0 0 0 0 105 -1 0 0 0 0 106 -1 0 0 0 0 107 -1 0 0 0 0 108 -1 0 0 0 0 109 -1 0 0 0 0 110 -1 0 0 0 0 111 -1 0 0 0 0 112 -1 0 0 0 0 113 -1 0 0 0 0 114 -1 0 0 0 0 115 -1 0 0 0 0 116 -1 0 0 0 0 117 -1 0 0 0 0 118 -1 0 0 0 0 119 -1 0 0 0 0 120 -1 0 0 0 0 121 -1 0 0 0 0 122 -1 0 0 0 0 123 -1 0 0 0 0 124 -1 0 0 0 0 ```

QoRTs/1.3.6 R/4.0.0