hartleys / QoRTs

Quality of RNA-Seq Toolset
52 stars 14 forks source link

High DROPPED_NOT_PROPER_PAIR and persistent warning about strandedness #68

Closed reventropy closed 6 years ago

reventropy commented 6 years ago

I'm trying to figure out why so many reads are being dropped and why I'm still getting a warning about strandedness. The bam files were created using Tophat. The strandedness should be fr-secondtranded (confirmed using chipster). For the run output posted below, tophat reports ~75% concordant read alignment. Bam files were sorted by name using Samtools

Here's the script I'm running with:

java -Xmx500g -jar /scratch/Users/jeja4312/zach_20180712/scripts/QoRTs/hartleys-QoRTs-39cd1fc/QoRTs.jar QC --stranded_fr_secondstrand \ --minMAPQ 50 --nameSorted --maxReadLength 151 \ accepted_hits_sorted.bam \ genes_spike.gtf \ /QoRT

The QC output is still warning me about strandedness even though it is specified:

Starting QC [Time: 2018-07-31 10:46:03] [Mem usage: [96MB / 2058MB]] [Elapsed Time: 00:00:00.0000] QoRTs is Running in paired-end mode. QoRTs is Running in name-sorted mode. NOTE: Function "overlapMatch" requires function "mismatchEngine". Adding "mismatchEngine" to the active function list... Running functions: CigarOpDistribution, GCDistribution, GeneCalcs, InsertSize, JunctionCalcs, NVC, QualityScoreDistribution, StrandCheck, chromCounts, cigarLocusCounts, mismatchEngine, overlapMatch, readLengthDistro, writeBiotypeCounts, writeClippedNVC, writeDESeq, writeDEXSeq, writeGeneBody, writeGeneCounts, writeGenewiseGeneBody, writeJunctionSeqCounts, writeKnownSplices, writeNovelSplices, writeSpliceExon Checking first 10000 reads. Checking SAM file for formatting errors... Note: Detected TopHat Alignment Program. Version: "2.1.1" IMPORTANT NOTE: Detected TopHat Alignment Program, version > 2. TopHat v2+ uses a different MAPQ convention than most aligners. Make sure you set the --minMAPQ parameter to 50 if you want to ignore multi-mapped reads. NOTE: Read length is not consistent. In the first 10000 reads, read length varies from 35 to 151 (param maxReadLength=151) Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information. Sorting Note: Reads appear to be grouped by read-pair, probably sorted by name(This is OK). Sorting Note: Reads are not sorted by position (This is OK). Done checking first 10000 reads. WARNINGS FOUND! SAMRecord Reader Generated. Read length: 151. [Time: 2018-07-31 10:46:06] [Mem usage: [747MB / 2595MB]] [Elapsed Time: 00:00:03.0668] Compiling flat feature annotation, internally in memory... Internal flat feature annotation compiled! QC Utilities Generated! [Time: 2018-07-31 10:47:19] [Mem usage: [6GB / 15GB]] [Elapsed Time: 00:01:16.0624] ..........[1000000 Read-Pairs processed] [Time: 2018-07-31 10:49:47] ..........[2000000 Read-Pairs processed] [Time: 2018-07-31 10:51:37] ..........[3000000 Read-Pairs processed] [Time: 2018-07-31 10:53:20] ..........[4000000 Read-Pairs processed] [Time: 2018-07-31 10:55:02] ..........[5000000 Read-Pairs processed] [Time: 2018-07-31 10:56:46] ..........[6000000 Read-Pairs processed] [Time: 2018-07-31 10:58:31] ..........[7000000 Read-Pairs processed] [Time: 2018-07-31 11:00:14] ..........[8000000 Read-Pairs processed] [Time: 2018-07-31 11:01:57] ..........[9000000 Read-Pairs processed] [Time: 2018-07-31 11:03:41] ..........[10000000 Read-Pairs processed] [Time: 2018-07-31 11:05:25] ..........[11000000 Read-Pairs processed] [Time: 2018-07-31 11:07:09] ..........[12000000 Read-Pairs processed] [Time: 2018-07-31 11:08:53] ..........[13000000 Read-Pairs processed] [Time: 2018-07-31 11:10:38] ..........[14000000 Read-Pairs processed] [Time: 2018-07-31 11:12:23] ..........[15000000 Read-Pairs processed] [Time: 2018-07-31 11:14:07] ..........[16000000 Read-Pairs processed] [Time: 2018-07-31 11:15:52] ..........[17000000 Read-Pairs processed] [Time: 2018-07-31 11:17:38] ..........[18000000 Read-Pairs processed] [Time: 2018-07-31 11:19:24] ..........[19000000 Read-Pairs processed] [Time: 2018-07-31 11:21:08] ..........[20000000 Read-Pairs processed] [Time: 2018-07-31 11:22:52] ..........[21000000 Read-Pairs processed] [Time: 2018-07-31 11:24:37] ..........[22000000 Read-Pairs processed] [Time: 2018-07-31 11:26:23] ..........[23000000 Read-Pairs processed] [Time: 2018-07-31 11:28:08] ..........[24000000 Read-Pairs processed] [Time: 2018-07-31 11:29:52] ..........[25000000 Read-Pairs processed] [Time: 2018-07-31 11:31:36] ..........[26000000 Read-Pairs processed] [Time: 2018-07-31 11:33:23] ..........[27000000 Read-Pairs processed] [Time: 2018-07-31 11:35:06] ..........[28000000 Read-Pairs processed] [Time: 2018-07-31 11:36:50] ..........[29000000 Read-Pairs processed] [Time: 2018-07-31 11:38:34] ..........[30000000 Read-Pairs processed] [Time: 2018-07-31 11:40:20] ..........[31000000 Read-Pairs processed] [Time: 2018-07-31 11:42:05] ..........[32000000 Read-Pairs processed] [Time: 2018-07-31 11:43:50] ..........[33000000 Read-Pairs processed] [Time: 2018-07-31 11:45:36] ..........[34000000 Read-Pairs processed] [Time: 2018-07-31 11:47:22] ..........[35000000 Read-Pairs processed] [Time: 2018-07-31 11:49:07] ..........[36000000 Read-Pairs processed] [Time: 2018-07-31 11:50:53] ..........[37000000 Read-Pairs processed] [Time: 2018-07-31 11:52:38] ..........[38000000 Read-Pairs processed] [Time: 2018-07-31 11:54:25] ..........[39000000 Read-Pairs processed] [Time: 2018-07-31 11:56:11] ..........[40000000 Read-Pairs processed] [Time: 2018-07-31 11:57:56] ..........[41000000 Read-Pairs processed] [Time: 2018-07-31 11:59:41] ..........[42000000 Read-Pairs processed] [Time: 2018-07-31 12:01:27] ..........[43000000 Read-Pairs processed] [Time: 2018-07-31 12:03:13] ..........[44000000 Read-Pairs processed] [Time: 2018-07-31 12:04:59] ..........[45000000 Read-Pairs processed] [Time: 2018-07-31 12:06:45] ..........[46000000 Read-Pairs processed] [Time: 2018-07-31 12:08:31] ..........[47000000 Read-Pairs processed] [Time: 2018-07-31 12:10:17] ..........[48000000 Read-Pairs processed] [Time: 2018-07-31 12:12:03] ..........[49000000 Read-Pairs processed] [Time: 2018-07-31 12:13:49] ..........[50000000 Read-Pairs processed] [Time: 2018-07-31 12:15:35] ..........[51000000 Read-Pairs processed] [Time: 2018-07-31 12:17:21] ..........[52000000 Read-Pairs processed] [Time: 2018-07-31 12:19:06] ..........[53000000 Read-Pairs processed] [Time: 2018-07-31 12:20:52] ..........[54000000 Read-Pairs processed] [Time: 2018-07-31 12:22:39] ..........[55000000 Read-Pairs processed] [Time: 2018-07-31 12:24:22] ..........[56000000 Read-Pairs processed] [Time: 2018-07-31 12:26:06] ..........[57000000 Read-Pairs processed] [Time: 2018-07-31 12:27:51] ..........[58000000 Read-Pairs processed] [Time: 2018-07-31 12:29:34] ..........[59000000 Read-Pairs processed] [Time: 2018-07-31 12:31:19] ..........[60000000 Read-Pairs processed] [Time: 2018-07-31 12:33:05] ..........[61000000 Read-Pairs processed] [Time: 2018-07-31 12:34:49] ..........[62000000 Read-Pairs processed] [Time: 2018-07-31 12:36:32] ..........[63000000 Read-Pairs processed] [Time: 2018-07-31 12:38:17] ..........[64000000 Read-Pairs processed] [Time: 2018-07-31 12:40:02] ..........[65000000 Read-Pairs processed] [Time: 2018-07-31 12:41:46] ..........[66000000 Read-Pairs processed] [Time: 2018-07-31 12:43:31] ..........[67000000 Read-Pairs processed] [Time: 2018-07-31 12:45:16] ..........[68000000 Read-Pairs processed] [Time: 2018-07-31 12:46:59] ..........[69000000 Read-Pairs processed] [Time: 2018-07-31 12:48:44] ..........[70000000 Read-Pairs processed] [Time: 2018-07-31 12:50:28] ..........[71000000 Read-Pairs processed] [Time: 2018-07-31 12:52:15] ..........[72000000 Read-Pairs processed] [Time: 2018-07-31 12:54:00] ..........[73000000 Read-Pairs processed] [Time: 2018-07-31 12:55:45] ..........[74000000 Read-Pairs processed] [Time: 2018-07-31 12:57:29] ..........[75000000 Read-Pairs processed] [Time: 2018-07-31 12:59:14] ..........[76000000 Read-Pairs processed] [Time: 2018-07-31 13:00:58] ..........[77000000 Read-Pairs processed] [Time: 2018-07-31 13:02:42] ..........[78000000 Read-Pairs processed] [Time: 2018-07-31 13:04:29] ..........[79000000 Read-Pairs processed] [Time: 2018-07-31 13:06:15] ..........[80000000 Read-Pairs processed] [Time: 2018-07-31 13:08:00] ..........[81000000 Read-Pairs processed] [Time: 2018-07-31 13:09:45] ..........[82000000 Read-Pairs processed] [Time: 2018-07-31 13:11:32] ..........[83000000 Read-Pairs processed] [Time: 2018-07-31 13:13:17] ..........[84000000 Read-Pairs processed] [Time: 2018-07-31 13:15:03] ..........[85000000 Read-Pairs processed] [Time: 2018-07-31 13:16:50] ..........[86000000 Read-Pairs processed] [Time: 2018-07-31 13:18:34] ..........[87000000 Read-Pairs processed] [Time: 2018-07-31 13:20:19] ..........[88000000 Read-Pairs processed] [Time: 2018-07-31 13:22:01] ..........[89000000 Read-Pairs processed] [Time: 2018-07-31 13:23:45] ..........[90000000 Read-Pairs processed] [Time: 2018-07-31 13:25:27] ..........[91000000 Read-Pairs processed] [Time: 2018-07-31 13:27:11] ..........[92000000 Read-Pairs processed] [Time: 2018-07-31 13:28:55] ..........[93000000 Read-Pairs processed] [Time: 2018-07-31 13:30:38] ..........[94000000 Read-Pairs processed] [Time: 2018-07-31 13:32:21] ..........[95000000 Read-Pairs processed] [Time: 2018-07-31 13:34:05] ..........[96000000 Read-Pairs processed] [Time: 2018-07-31 13:35:49] ... Finished reading SAM. Read: 96392753 reads/read-pairs. Finished reading SAM. Used: 50109438 reads/read-pairs. [Time: 2018-07-31 13:36:30] [Mem usage: [3475MB / 6GB]] [Elapsed Time: 02:50:26.0783]

Read Stats: READ_PAIR_OK 50109438 TOTAL_READ_PAIRS 96392753 DROPPED_NOT_PROPER_PAIR 45640219 DROPPED_READ_FAILS_VENDOR_QC 0 DROPPED_MARKED_NOT_VALID 0 DROPPED_CHROMS_MISMATCH 30 DROPPED_PAIR_STRANDS_MISMATCH 0 DROPPED_IGNORED_CHROMOSOME 0 DROPPED_NOT_UNIQUE_ALIGNMENT 643066 DROPPED_NO_ALN_BLOCKS 0 DROPPED_NOT_MARKED_RG -1 Pre-alignment read count unknown (Set --seqReadCt or --rawfastq) Writing Output... WARNING: The data appears to be STRANDED, following the fr_secondStrand rule. Are you sure this isn't stranded data? If it is stranded, then you should probably re-run QoRTs with the "--stranded" and "--stranded_fr_secondstrand" options! QoRTs completed WITH WARNINGS! See log for details. Done. Time spent on setup: 00:01:16.0624 Time spent on SAM iteration: 02:49:10.0159 (1.7549728729779785 minutes per million read-pairs) (3.3759442017024153 minutes per million read-pairs used) Time spent on file output: 00:00:44.0365 Total runtime: 02:51:11.0148 Done. (Tue Jul 31 13:37:14 MDT 2018)

reventropy commented 6 years ago

Strandedness warning went away after specifying "--stranded" as the warning says (re-run QoRTs with the "--stranded" and "--stranded_fr_secondstrand" options). If "stranded_fr_secondstrand" is specified why does "-stranded" also need to be specified. Now I'm just wrestling with the "DROPPED_NOT_PROPER_PAIR" issue which might have to do with the bam assembly. Any help would be very appreciated. Thanks!

reventropy commented 6 years ago

I wonder if the issue comes from using Trimmomatic to remove adapter sequences? Based on biostars posts, this is a common warning that seems to be mostly ignored.

"NOTE: Read length is not consistent. In the first 10000 reads, read length varies from 35 to 151 (param maxReadLength=151) Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information."

How exactly does one replace sequences with N's. Is there a package that does this? I don't see this option in Trimmomatic. I'll try to run Tophat2->QoRTs without Trimmomatic and see if that improves the situation.

hartleys commented 6 years ago

As noted in the warning, You need to use the "--stranded" option AND the "--stranded_fr_secondstrand" option.

On Tue, Jul 31, 2018, 4:04 PM reventropy notifications@github.com wrote:

I'm trying to figure out why the read mapping rate is so low and why I'm still getting a warning about strandedness. The bam files were created using Tophat. The strandedness should be fr-secondtranded (confirmed using chipster). For the run output posted below, tophat reports ~75% concordant read alignment. Bam files were sorted by name using Samtools

Here's the script I'm running with:

java -Xmx500g -jar /scratch/Users/jeja4312/zach_20180712/scripts/QoRTs/hartleys-QoRTs-39cd1fc/QoRTs.jar QC --stranded_fr_secondstrand --minMAPQ 50 --nameSorted --maxReadLength 151 accepted_hits_sorted.bam genes_spike.gtf /QoRT

The QC output is still warning me about strandedness even though it is specified:

Starting QC [Time: 2018-07-31 10:46:03] [Mem usage: [96MB / 2058MB]] [Elapsed Time: 00:00:00.0000] QoRTs is Running in paired-end mode. QoRTs is Running in name-sorted mode. NOTE: Function "overlapMatch" requires function "mismatchEngine". Adding "mismatchEngine" to the active function list... Running functions: CigarOpDistribution, GCDistribution, GeneCalcs, InsertSize, JunctionCalcs, NVC, QualityScoreDistribution, StrandCheck, chromCounts, cigarLocusCounts, mismatchEngine, overlapMatch, readLengthDistro, writeBiotypeCounts, writeClippedNVC, writeDESeq, writeDEXSeq, writeGeneBody, writeGeneCounts, writeGenewiseGeneBody, writeJunctionSeqCounts, writeKnownSplices, writeNovelSplices, writeSpliceExon Checking first 10000 reads. Checking SAM file for formatting errors... Note: Detected TopHat Alignment Program. Version: "2.1.1" IMPORTANT NOTE: Detected TopHat Alignment Program, version > 2. TopHat v2+ uses a different MAPQ convention than most aligners. Make sure you set the --minMAPQ parameter to 50 if you want to ignore multi-mapped reads. NOTE: Read length is not consistent. In the first 10000 reads, read length varies from 35 to 151 (param maxReadLength=151) Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information. Sorting Note: Reads appear to be grouped by read-pair, probably sorted by name(This is OK). Sorting Note: Reads are not sorted by position (This is OK). Done checking first 10000 reads. WARNINGS FOUND! SAMRecord Reader Generated. Read length: 151. [Time: 2018-07-31 10:46:06] [Mem usage: [747MB / 2595MB]] [Elapsed Time: 00:00:03.0668] Compiling flat feature annotation, internally in memory... Internal flat feature annotation compiled! QC Utilities Generated! [Time: 2018-07-31 10:47:19] [Mem usage: [6GB / 15GB]] [Elapsed Time: 00:01:16.0624] ..........[1000000 Read-Pairs processed] [Time: 2018-07-31 10:49:47] ..........[2000000 Read-Pairs processed] [Time: 2018-07-31 10:51:37] ..........[3000000 Read-Pairs processed] [Time: 2018-07-31 10:53:20] ..........[4000000 Read-Pairs processed] [Time: 2018-07-31 10:55:02] ..........[5000000 Read-Pairs processed] [Time: 2018-07-31 10:56:46] ..........[6000000 Read-Pairs processed] [Time: 2018-07-31 10:58:31] ..........[7000000 Read-Pairs processed] [Time: 2018-07-31 11:00:14] ..........[8000000 Read-Pairs processed] [Time: 2018-07-31 11:01:57] ..........[9000000 Read-Pairs processed] [Time: 2018-07-31 11:03:41] ..........[10000000 Read-Pairs processed] [Time: 2018-07-31 11:05:25] ..........[11000000 Read-Pairs processed] [Time: 2018-07-31 11:07:09] ..........[12000000 Read-Pairs processed] [Time: 2018-07-31 11:08:53] ..........[13000000 Read-Pairs processed] [Time: 2018-07-31 11:10:38] ..........[14000000 Read-Pairs processed] [Time: 2018-07-31 11:12:23] ..........[15000000 Read-Pairs processed] [Time: 2018-07-31 11:14:07] ..........[16000000 Read-Pairs processed] [Time: 2018-07-31 11:15:52] ..........[17000000 Read-Pairs processed] [Time: 2018-07-31 11:17:38] ..........[18000000 Read-Pairs processed] [Time: 2018-07-31 11:19:24] ..........[19000000 Read-Pairs processed] [Time: 2018-07-31 11:21:08] ..........[20000000 Read-Pairs processed] [Time: 2018-07-31 11:22:52] ..........[21000000 Read-Pairs processed] [Time: 2018-07-31 11:24:37] ..........[22000000 Read-Pairs processed] [Time: 2018-07-31 11:26:23] ..........[23000000 Read-Pairs processed] [Time: 2018-07-31 11:28:08] ..........[24000000 Read-Pairs processed] [Time: 2018-07-31 11:29:52] ..........[25000000 Read-Pairs processed] [Time: 2018-07-31 11:31:36] ..........[26000000 Read-Pairs processed] [Time: 2018-07-31 11:33:23] ..........[27000000 Read-Pairs processed] [Time: 2018-07-31 11:35:06] ..........[28000000 Read-Pairs processed] [Time: 2018-07-31 11:36:50] ..........[29000000 Read-Pairs processed] [Time: 2018-07-31 11:38:34] ..........[30000000 Read-Pairs processed] [Time: 2018-07-31 11:40:20] ..........[31000000 Read-Pairs processed] [Time: 2018-07-31 11:42:05] ..........[32000000 Read-Pairs processed] [Time: 2018-07-31 11:43:50] ..........[33000000 Read-Pairs processed] [Time: 2018-07-31 11:45:36] ..........[34000000 Read-Pairs processed] [Time: 2018-07-31 11:47:22] ..........[35000000 Read-Pairs processed] [Time: 2018-07-31 11:49:07] ..........[36000000 Read-Pairs processed] [Time: 2018-07-31 11:50:53] ..........[37000000 Read-Pairs processed] [Time: 2018-07-31 11:52:38] ..........[38000000 Read-Pairs processed] [Time: 2018-07-31 11:54:25] ..........[39000000 Read-Pairs processed] [Time: 2018-07-31 11:56:11] ..........[40000000 Read-Pairs processed] [Time: 2018-07-31 11:57:56] ..........[41000000 Read-Pairs processed] [Time: 2018-07-31 11:59:41] ..........[42000000 Read-Pairs processed] [Time: 2018-07-31 12:01:27] ..........[43000000 Read-Pairs processed] [Time: 2018-07-31 12:03:13] ..........[44000000 Read-Pairs processed] [Time: 2018-07-31 12:04:59] ..........[45000000 Read-Pairs processed] [Time: 2018-07-31 12:06:45] ..........[46000000 Read-Pairs processed] [Time: 2018-07-31 12:08:31] ..........[47000000 Read-Pairs processed] [Time: 2018-07-31 12:10:17] ..........[48000000 Read-Pairs processed] [Time: 2018-07-31 12:12:03] ..........[49000000 Read-Pairs processed] [Time: 2018-07-31 12:13:49] ..........[50000000 Read-Pairs processed] [Time: 2018-07-31 12:15:35] ..........[51000000 Read-Pairs processed] [Time: 2018-07-31 12:17:21] ..........[52000000 Read-Pairs processed] [Time: 2018-07-31 12:19:06] ..........[53000000 Read-Pairs processed] [Time: 2018-07-31 12:20:52] ..........[54000000 Read-Pairs processed] [Time: 2018-07-31 12:22:39] ..........[55000000 Read-Pairs processed] [Time: 2018-07-31 12:24:22] ..........[56000000 Read-Pairs processed] [Time: 2018-07-31 12:26:06] ..........[57000000 Read-Pairs processed] [Time: 2018-07-31 12:27:51] ..........[58000000 Read-Pairs processed] [Time: 2018-07-31 12:29:34] ..........[59000000 Read-Pairs processed] [Time: 2018-07-31 12:31:19] ..........[60000000 Read-Pairs processed] [Time: 2018-07-31 12:33:05] ..........[61000000 Read-Pairs processed] [Time: 2018-07-31 12:34:49] ..........[62000000 Read-Pairs processed] [Time: 2018-07-31 12:36:32] ..........[63000000 Read-Pairs processed] [Time: 2018-07-31 12:38:17] ..........[64000000 Read-Pairs processed] [Time: 2018-07-31 12:40:02] ..........[65000000 Read-Pairs processed] [Time: 2018-07-31 12:41:46] ..........[66000000 Read-Pairs processed] [Time: 2018-07-31 12:43:31] ..........[67000000 Read-Pairs processed] [Time: 2018-07-31 12:45:16] ..........[68000000 Read-Pairs processed] [Time: 2018-07-31 12:46:59] ..........[69000000 Read-Pairs processed] [Time: 2018-07-31 12:48:44] ..........[70000000 Read-Pairs processed] [Time: 2018-07-31 12:50:28] ..........[71000000 Read-Pairs processed] [Time: 2018-07-31 12:52:15] ..........[72000000 Read-Pairs processed] [Time: 2018-07-31 12:54:00] ..........[73000000 Read-Pairs processed] [Time: 2018-07-31 12:55:45] ..........[74000000 Read-Pairs processed] [Time: 2018-07-31 12:57:29] ..........[75000000 Read-Pairs processed] [Time: 2018-07-31 12:59:14] ..........[76000000 Read-Pairs processed] [Time: 2018-07-31 13:00:58] ..........[77000000 Read-Pairs processed] [Time: 2018-07-31 13:02:42] ..........[78000000 Read-Pairs processed] [Time: 2018-07-31 13:04:29] ..........[79000000 Read-Pairs processed] [Time: 2018-07-31 13:06:15] ..........[80000000 Read-Pairs processed] [Time: 2018-07-31 13:08:00] ..........[81000000 Read-Pairs processed] [Time: 2018-07-31 13:09:45] ..........[82000000 Read-Pairs processed] [Time: 2018-07-31 13:11:32] ..........[83000000 Read-Pairs processed] [Time: 2018-07-31 13:13:17] ..........[84000000 Read-Pairs processed] [Time: 2018-07-31 13:15:03] ..........[85000000 Read-Pairs processed] [Time: 2018-07-31 13:16:50] ..........[86000000 Read-Pairs processed] [Time: 2018-07-31 13:18:34] ..........[87000000 Read-Pairs processed] [Time: 2018-07-31 13:20:19] ..........[88000000 Read-Pairs processed] [Time: 2018-07-31 13:22:01] ..........[89000000 Read-Pairs processed] [Time: 2018-07-31 13:23:45] ..........[90000000 Read-Pairs processed] [Time: 2018-07-31 13:25:27] ..........[91000000 Read-Pairs processed] [Time: 2018-07-31 13:27:11] ..........[92000000 Read-Pairs processed] [Time: 2018-07-31 13:28:55] ..........[93000000 Read-Pairs processed] [Time: 2018-07-31 13:30:38] ..........[94000000 Read-Pairs processed] [Time: 2018-07-31 13:32:21] ..........[95000000 Read-Pairs processed] [Time: 2018-07-31 13:34:05] ..........[96000000 Read-Pairs processed] [Time: 2018-07-31 13:35:49] ... Finished reading SAM. Read: 96392753 reads/read-pairs. Finished reading SAM. Used: 50109438 reads/read-pairs. [Time: 2018-07-31 13:36:30] [Mem usage: [3475MB / 6GB]] [Elapsed Time: 02:50:26.0783]

Read Stats: READ_PAIR_OK 50109438 TOTAL_READ_PAIRS 96392753 DROPPED_NOT_PROPER_PAIR 45640219 DROPPED_READ_FAILS_VENDOR_QC 0 DROPPED_MARKED_NOT_VALID 0 DROPPED_CHROMS_MISMATCH 30 DROPPED_PAIR_STRANDS_MISMATCH 0 DROPPED_IGNORED_CHROMOSOME 0 DROPPED_NOT_UNIQUE_ALIGNMENT 643066 DROPPED_NO_ALN_BLOCKS 0 DROPPED_NOT_MARKED_RG -1 Pre-alignment read count unknown (Set --seqReadCt or --rawfastq) Writing Output... WARNING: The data appears to be STRANDED, following the fr_secondStrand rule. Are you sure this isn't stranded data? If it is stranded, then you should probably re-run QoRTs with the "--stranded" and "--stranded_fr_secondstrand" options! QoRTs completed WITH WARNINGS! See log for details. Done. Time spent on setup: 00:01:16.0624 Time spent on SAM iteration: 02:49:10.0159 (1.7549728729779785 minutes per million read-pairs) (3.3759442017024153 minutes per million read-pairs used) Time spent on file output: 00:00:44.0365 Total runtime: 02:51:11.0148 Done. (Tue Jul 31 13:37:14 MDT 2018)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hartleys/QoRTs/issues/68, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwu7ITL4N5zBGc8bTvxeGRkapkLdu6Nks5uMLg9gaJpZM4VpSzg .

hartleys commented 6 years ago

Yes. This second warning is caused by trimmomatic. Nothing to worry about.

I wrote a script that did this, ages ago, but I never had time to write it up properly. It's fine to just use trimmomatic.

On Wed, Aug 1, 2018, 12:49 PM reventropy notifications@github.com wrote:

I wonder if the issue comes from using Trimmomatic to remove adapter sequences? Based on biostars posts, this is a common warning that seems to be mostly ignored.

"NOTE: Read length is not consistent. In the first 10000 reads, read length varies from 35 to 151 (param maxReadLength=151) Note that using data that is hard-clipped prior to alignment is NOT recommended, because this makes it difficult (or impossible) to determine the sequencer read-cycle of each nucleotide base. This may obfuscate cycle-specific artifacts, trends, or errors, the detection of which is one of the primary purposes of QoRTs!In addition, hard clipping (whether before or after alignment) removes quality score data, and thus quality score metrics may be misleadingly optimistic. A MUCH preferable method of removing undesired sequence is to replace such sequence with N's, which preserves the quality score and the sequencer cycle information."

How exactly does one replace sequences with N's. Is there a package that does this? I don't see this option in Trimmomatic. I'll try to run Tophat2->QoRTs without Trimmomatic and see if that improves the situation.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hartleys/QoRTs/issues/68#issuecomment-409643213, or mute the thread https://github.com/notifications/unsubscribe-auth/ACwu7Eb9lhUbxyHeBnaR1xLUGZvLoRDfks5uMdwpgaJpZM4VpSzg .