timbitz / Whippet.jl

Lightweight and Fast; RNA-seq quantification at the event-level
MIT License
105 stars 21 forks source link

Start coordinate > end coordinate by 1 for a few RI cases #2

Closed kcha closed 8 years ago

kcha commented 8 years ago

Not sure if bug. Found some RI examples where the start coordinate is larger than the stop coordinate. Doesn't seem to be always the case, though.

> gzip -dc sample_01.psi.gz | awk '{split($2,c1,":"); split(c1[2], c2, "-"); if (c2[1] >= c2[2]) {print $0}}' -
TYSND1  chr10:71902521-71902520 -   RI  0.0 0.7120683854787901
PCDH15  chr10:55587152-55587152 -   AD  0.0 1.0
SPNS1   chr16:28986422-28986421 +   RI  0.9273  1.0 IntSet([1, 2, 3, 4, 5]) IntSet([1, 2, 4, 5])
MAPK8IP3    chr16:1798722-1798722   +   AD  0.0 1.0
RELN    chr7:103113356-103113356    -   AA  0.99    1.0
UTY chrY:15466882-15466882  -   AD  0.2063  0.8025488811964229  IntSet([33, 34, 35, 36, 37, 38])    IntSet([30, 31, 32, 34, 35, 36, 37, 38])
timbitz commented 8 years ago

Yah I expected this one, thanks Kevin.. The AD/AA's are OK. Obviously the RI is wrong, this a bug in building the index when there are kissing exons --->|<---.

timbitz commented 8 years ago

I will fix this when I add GTF support. For now it doesn't actually cause a problem at all or affect the accuracy of any results other than printing those few invalid RI lines once and a while.

timbitz commented 8 years ago

Closed by #18