yarden / MISO

MISO: Mixture of Isoforms model for RNA-Seq isoform quantitation
http://genes.mit.edu/burgelab/miso/index.html
132 stars 74 forks source link

log_score = nan? #64

Open olgabot opened 10 years ago

olgabot commented 10 years ago

Hello there, I was getting Psi scores of 0.5 for events that are definitely 100% included and Psi should be 1. Any idea why this might be happening?

Here's the event: image

Here's an example *.miso file:

$ head 'chr12/chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.miso'
#isoforms=['chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.A.up_chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.A.se_chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.A.dn','chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.B.up_chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.B.dn']  exon_lens=('chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.A.up',57),('chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.A.se',78),('chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.A.dn',33),('chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.B.up',57),('chr12:32890039:32890095:+@chr12:32890799:32890876:+@chr12:32891198:32891230:+.B.dn',33)     iters=5000      burn_in=500     lag=10  percent_accept=0.00     proposal_type=drift     counts=(0,0):402,(1,0):107      assigned_counts=0:107   chrom=chr12    strand=+ mRNA_starts=32890039,32890039   mRNA_ends=32891230,32891230
sampled_psi     log_score
0.4944,0.5056   nan
0.5145,0.4855   nan
0.4986,0.5014   nan
0.5086,0.4914   nan
0.5013,0.4987   nan
0.4909,0.5091   nan
0.4944,0.5056   nan
0.5145,0.4855   nan

Here's the bam file

Here's the command:

READ_LEN=$(samtools view P2_04.bam | head -n 1 | cut -f 10 | awk '{ print length }')

# calculate Psi scores for all SE events
mkdir -p miso/P2_04/SE
python /home/yeo-lab/software/bin/miso  --run /projects/ps-yeolab/genomes/hg19/miso/SE_index  P2_04.bam --output-dir P2_04/SE  --read-len $READ_LEN   --settings-filename /projects/ps-yeolab/genomes/hg19/miso_annotations/miso_settings_min_event_reads10.txt  -p 16  > miso/P2_04/SE/psi.out  2> miso/P2_04/SE/psi.err

And here's the settings file:

[data]
filter_results = True
min_event_reads = 10

[cluster]
cluster_command = qsub

[sampler]
burn_in = 500
lag = 10
num_iters = 5000
num_chains = 6
num_processors = 4
yarden commented 10 years ago

Hi Olga, Thanks for this nice test case - I just reproduced it. It's most certainly a bug. I'm looking into it now.

olgabot commented 10 years ago

Any news on this?

yarden commented 10 years ago

Hi Olga,

It appears that the read length is 92 and your short isoform is 90, less than the read length, and this is what caused it.

olgabot commented 10 years ago

Thanks for digging into this! The exon lengths are: 57, 78, 33. So is the "short isoform" exon1+exon3 (57+33 = 90)?

If the read length is longer than the isoform length, why would the event get thrown out? There's perfect coverage of the event.

yarden commented 10 years ago

I agree it's not intuitive, but here's one explanation. Since your read length is 92 and the shortest isoform is 90, it's not possible -- unless you consider non-end-to-end alignments, which MISO doesn't -- to have exclusion reads. That is, you by definition can't observe skipping of the exon in this setup (e.g. in single end reads). So based on the data, the Psi could be 100% because there's no skipping, or it could just be that you can't observe the exclusion isoform. This edge case will have to be dealt with in a more intuitive way in the next version. If you have thoughts on best behavior, let me know. --Yarden

olgabot commented 10 years ago

True, but that ignores that there's possible exons upstream and downstream of this particular event. Can future versions just look at the parts of the read that overlap with the event? What problems could this create? I suppose if this was an AFE or ALE event and the read overlapped the beginning/end of the first exon, that would be inconsistent with the annotation. But if the read was soft-clipped, would this matter?

yarden commented 10 years ago

The exon-centric annotations are intended to be only of an alternative trio of interest, so if you use that kind of annotation, that's the only thing MISO knows about -- looking at exons upstream and downstream of the event would require knowledge about which genome to look at, which reference annotation, etc. You get the same effect by making an annotation that includes more of the flanking upstream and downstream exons of the event. If your reads are length X, you could include enough flanking exons such that no isoform is shorter than X (ideally not shorter than your average insert length either). That would require deciding which exons should be included (since they could be alternative too, etc.), so to keep things generic it's up to the annotation to determines what gets considered.

olgabot commented 9 years ago

Getting a similar bug again, this time on RI events. The event is in MYL6, 'chr12:56553371-56553406:+@chr12:56553759-56553932:+' is called as 0.5 in these samples, even though I don't see any intron retention, and think the values here should be "0."

image

Here's one of the bam files, with just reads from the MYL6 locus.

I don't think this is the same as the short isoform bug, because isoforms A and B are 562 and 210, respectively, and my reads are 100bp.

This was using misopy v0.5.2