False positives in novelSplices.txt.gz?

hartleys / QoRTs

Quality of RNA-Seq Toolset

52 stars 14 forks source link

False positives in novelSplices.txt.gz? #28

Closed naumenko-sa closed 7 years ago

naumenko-sa commented 7 years ago

Hi Stephen!

Thank you for the excellent tool!

I run qorts with

java -Xmx10g -jar ~/work/tools/bin/QoRTs.jar QC \
    $bam \
    ref-transcripts.gff \
    ${bam}.qorts

and in the result had

gunzip -c QC.spliceJunctionCounts.knownSplices.txt.gz | wc -l
344554
gunzip -c QC.spliceJunctionCounts.novelSplices.txt.gz | wc -l
100875

splice junctions.

When I started checking novel splice junctions in IGV comparing junctions called with actual reads aligned, I saw many false positive junctions with good coverage (>10 reads).
I was not able to figure out, why qorts called a junction there.

Why this could be? What is the approximate false positive rate of novel junction discovery?

Thanks, Sergey

hartleys commented 7 years ago

QoRTs is NOT intended to perform novel splice junction detection!

All that it does is report the splice junctions aligned by your aligner. The false discovery rate depends entirely on your aligner, since the aligner is responsible for detecting novel splice junctions. All QoRTs does is count them.

I don't generally recommend IGV for this sort of work. By default it does a lot of nasty stuff like subsampling down your dataset, or ignoring reads with multiple splice junctions (at least in the sashimi plots).

Lots of things can cause false positive novel junctions. Especially with repetitive regions or ambiguous reads, aligners often can't tell exactly where spliced reads actually map. It's just something that happens.

naumenko-sa commented 7 years ago

Thanks Stephen!

I'm using 2pass STAR to align reads. About IGV - what is the alternative? Yes, I have all the plots from JunctionSeq, but sometimes it is useful to view at the level of reads. JunctionSeq generates tracks for IGV with the coverage of novel junctions, these should be precise?

Sergey

hartleys commented 7 years ago

Yeah. Generally I use the BED and wiggle files generated by QoRTs. If things look really really odd, I might just view the reads directly via samtools view.