ngs-docs / 2014-msu-rnaseq

1 stars 1 forks source link

Bad practice recommendation: htseq #1

Open blahah opened 9 years ago

blahah commented 9 years ago

Some tool choices in RNAseq are a matter preference, in that the evidence is not clear enough to enable an empirically or theoretically motivated choice. However, some things are just demonstrably bad practice. Quantifying expression at the gene level by counting reads mapping to a gene (e.g. using htseq-count) is an example of bad practice.

There is a substantial literature showing that it's important to quantify at the isoform level, and that gene-level quantification is at best misleading. It's also theoretically bad. See fig1b of Trapnell et al. 2013 for an illustration of just one reason.

Why not recommend a best-practice pipeline that holds across model and de-novo assembled references? Something like:

blahah commented 9 years ago

Also note, if you're committed to TopHat, then Cufflinks is a vastly superior method for quantifying expression than htseq-count (because it does isoform-level expression).