Feature Idea : Derive abundance from inferential replicates

thelovelab / tximport

Transcript quantification import for modular pipelines

136 stars 33 forks source link

Feature Idea : Derive abundance from inferential replicates #23

Closed rob-p closed 6 years ago

rob-p commented 6 years ago

It would be nice to allow tximport to optionally derive the abundance for each transcript in each sample by computing some summary statistic (e.g. median or mean) of the inferential replicates, if they are available. These can sometimes be more robust or accurate than individual point estimates, and if inferential replicates are already being read in and summarized to obtain variances, the overhead for doing what is suggested above should be marginal.

mikelove commented 6 years ago

infRepStat is added in b906bb29ce82f6f188ffde042a74ba5a055d1686

This re-computes counts and abundances while importing inferential replicates. It can be used with countsFromAbundance (which will re-compute counts a second time) and with tx2gene. Everything proceeds downstream as if the re-computed counts and abundances were the original point estimates.

One example is infRepStat=matrixStats::rowMedians to compute the median of posterior samples, for example.

mikelove commented 6 years ago

I cleaned up some unrelated code (RSEM gene-level input) while I was adding this feature, so it's a bit hard to navigate the diffs. It's actually all in these three lines:

https://github.com/mikelove/tximport/commit/b906bb29ce82f6f188ffde042a74ba5a055d1686#diff-dd69e134bd79893f01d1fb56280b1ee0R342