Problem with assigned_counts

yarden / MISO

MISO: Mixture of Isoforms model for RNA-Seq isoform quantitation

http://genes.mit.edu/burgelab/miso/index.html

132 stars 74 forks source link

Problem with assigned_counts #56

Closed mnsmar closed 8 years ago

mnsmar commented 11 years ago

In the method "count_isoform_assignments" in the file "reads_utils.py" if one of the transcripts has not been assigned a read then it is never returned with 0 value and eventually is missed in the output column "assigned_counts".

eg consider a gene with 4 transcripts of which only the first 2 are assigned reads. In this case transcirpts 3 and 4 are never returned by the method.

yarden commented 11 years ago

Thank you, you're correct and it's on our list to fix it. Please note though that assigned_counts is only a snapshot of the assignment of reads to isoforms in the last step of the MCMC inference algorithm, so it's an unstable metric and it really has no use for downstream analyses -- it's more of an internal debugging purposes output. The best way to estimate the fraction of reads assigned to each isoform is to use the estimated Psi values, as these take into account all iterations of inference and not just the last step. Best, --Yarden

mnsmar commented 11 years ago

Thanks Yarden. Could you please elaborate a little bit on how should the Psi values be used to get isoform assigned counts?

gaborcsardi commented 11 years ago

Multiply the psi values by the total number of reads for the gene, and you get the (mean) assigned counts.