lweasel / piquant

A pipeline to assess the quantification of transcripts.
http://piquant.readthedocs.org/en/latest/
MIT License
19 stars 4 forks source link

Consider the effect of effective length correction (pun intended) #58

Open rob-p opened 8 years ago

rob-p commented 8 years ago

The Flux simulator seems to not respect the effective length of transcripts during it's simulation. This means that quantification tools that "correctly" adjust for effective length will be penalized for this correction (more strongly under relative abundance measures such as TPM than via the estimated number of reads). Since the Flux simulator produces (via its .lib file) the actual set of fragment lengths present in the underlying library, it is possible to compute "true" effective lengths for each transcript, which can then be used when computing the ground-truth TPM values. It might be worth allowing this as an option to piquant, or reporting the accuracy of different with respect to the "true" TPM computed in both ways.

Here is a gist that implements computation of the effective lengths for transcripts given the Flux simulator's .lib file and a dataframe containing the un-corrected lengths. Let me know if you think this makes sense to include.

lweasel commented 8 years ago

Hi Rob - yes, that makes good sense to include, and many thanks for the code for the computation!