Closed mnsmar closed 8 years ago
Thank you, you're correct and it's on our list to fix it. Please note though that assigned_counts is only a snapshot of the assignment of reads to isoforms in the last step of the MCMC inference algorithm, so it's an unstable metric and it really has no use for downstream analyses -- it's more of an internal debugging purposes output. The best way to estimate the fraction of reads assigned to each isoform is to use the estimated Psi values, as these take into account all iterations of inference and not just the last step. Best, --Yarden
Thanks Yarden. Could you please elaborate a little bit on how should the Psi values be used to get isoform assigned counts?
Multiply the psi values by the total number of reads for the gene, and you get the (mean) assigned counts.
In the method "count_isoform_assignments" in the file "reads_utils.py" if one of the transcripts has not been assigned a read then it is never returned with 0 value and eventually is missed in the output column "assigned_counts".
eg consider a gene with 4 transcripts of which only the first 2 are assigned reads. In this case transcirpts 3 and 4 are never returned by the method.