pachterlab / sleuth

Differential analysis of RNA-Seq
http://pachterlab.github.io/sleuth
GNU General Public License v3.0
304 stars 95 forks source link

NA in likelihood ratio test with small number of transcripts #173

Open winni2k opened 6 years ago

winni2k commented 6 years ago

This is a continuation of a discussion started in #68. I am moving it here because it appears that my problem has a different cause from the what appears to be the cause in #68.

winni2k commented 6 years ago

In response to https://github.com/pachterlab/sleuth/issues/68#issuecomment-388214971: Let's assume for now that the 1084 transcripts are a complete representation of all possible transcripts that the reads that I am quantifying can map to.

This leaves us with the issue of shrinkage. If I understand @warrenmcg's comment correctly, then even if my transcriptome really only contains these 1084 transcripts, then shrinking inside sleuth would still not work, because this kind of shrinking needs more transcripts to work properly?

However, from https://github.com/pachterlab/sleuth/issues/68#issuecomment-388203180 it appears that a direct/exact LOESS estimation might still work?

If neither approach works, is there a way to turn off shrinking all together? The way I see it, an appropriate model to use in my case would be some sort of overdispersed Poisson regression implemented in for example edgeR. However, in that case I would miss out on quantification uncertainty that sleuth provides.

warrenmcg commented 6 years ago

@winni2k, you would be able to use the direct/exact LOESS. This is only implemented in the devel branch currently, so you can try it if you wish.

The shrinkage process in sleuth takes the transcripts, bins them into 100 groups (so for your 1084, that's ~10-11 transcripts per bin), then uses the transcripts with variance within the interquartile range of each bin to estimate the "global" mean-variance trend. These interquartile transcripts are then used to estimate the rest of the transcripts using LOESS. The default strategy is to "interpolate", so any transcripts with a mean outside of the ones used will yield an NA. I can't comment on exactly why you had so many without looking at your data directly, but I have only rarely seen a dataset where no transcript failed to be interpolated. The fix on the devel branch does a second round to recompute the transcripts with NA values using the exact computation. This guarantees a value for every transcript. Hopefully that helps!

@pimentel, similar to the normalization, transformation, and other steps, we may consider allowing advanced users to supply their own shrinkage procedure, or providing options (DESeq2-style shrinkage, edgeR-style shrinkage, etc).

pimentel commented 6 years ago

@warrenmcg I'm good with this solution. let's talk about the interface in messenger.

pimentel commented 6 years ago

after a discussion with @warrenmcg we've decided this will have to wait for the next release, but definitely on the TODO list. As-is, will require a bit of an API overhaul. Don't worry -- it'll be a much sooner release than last.

warrenmcg commented 5 years ago

Hi @winni2k,

Thanks for your patience. I have implemented functionality to specify a custom shrinkage function for sleuth_fit using the shrink_fun argument. This is available in a devel version of sleuth available on my profile. It can be installed using the following line of code:

devtools::install_github('warrenmcg/sleuth', ref = 'speedy_fit')

If you do not want to overwrite your current installation of sleuth, you can put the new one in a local library using the following (change the my_lib to whatever you like):

my_lib <- '~/test_R_library'
# make sure the library exists
dir.create(my_lib, showWarnings = FALSE)
# make it take precedence in this R session when installing new packages
# see ?.libPaths for more info
.libPaths(my_lib, .libPaths())
# install the devel version of sleuth 'my_lib'
devtools::install_github('warrenmcg/sleuth', ref = 'speedy_fit')

### NOW CLOSE R AND OPEN A NEW SESSION
my_lib <- '~/test_R_library' # CHANGE THIS
library(sleuth, lib.loc = my_lib)

Let us know if this fits your needs or if you have any feedback. Thanks!

winni2k commented 5 years ago

Hi @warrenmcg, Thanks for the fix! Unfortunately, this particular analysis is now published, but I am finding this type of analysis arises more and more often in my work. I'll see about giving it a try!