theislab / diffxpy

Differential expression analysis for single-cell RNA-seq data.
https://diffxpy.rtfd.io
BSD 3-Clause "New" or "Revised" License
192 stars 23 forks source link

Input expression matrix #161

Open slieped opened 4 years ago

slieped commented 4 years ago

Hi all! I had great success using scanpy with snRNA data.My current project is build on bulk RNAseq data, and since I am using public data, I only have access to the FPKM (batch-corrected) normalized data.

I have read in other issues that the optimal noise model is the negative binomial noise, but it requires raw counts. In addition, inside the Walds function, I only see the nb option (default). I was wondering if given the nature of my data (normalized and ?log-transformed? ) if there is any noise model recommendation by you guys. I would prefer to stick with your tool instead of moving to limma or other packages.

Thank you in advance!

davidsebfischer commented 4 years ago

Hi @slieped! You can fit NB models also on data that does not only contain integers, the further you stray from NB distributed data, the less this analysis makes sense though. With normalized and log transformed data, I would tend first try a Gaussian noise model, we do not have that yet but hopfully soon. However, you might want to revisit what you are trying to achieve: Are you comparing two groups? If so, given that you data is strongly processed to look more gaussian, you could also just run a t-test. Exploring GLMs / LMs here makes sense if you have a strong confounder, for example.