jdblischak / singleCellSeq

Batch effects and the effective design of single-cell gene expression studies
http://jdblischak.github.io/singleCellSeq/analysis
Other
98 stars 69 forks source link

Overexpessed genes #1

Closed jhsiao999 closed 8 years ago

jhsiao999 commented 9 years ago

Do we trust OEFinder? Shall we run downstream analysis after removing these genes?

jhsiao999 commented 9 years ago

I found that the genes identified by OEFinder as "overexpressed" distribute evenly across all levels of gene expression (averaged across cells)....Need to look more into this and see whether we need to remove all these genes before downstream analysis...

pytung commented 9 years ago

That's weird. How do they identify over-expressed genes?

jhsiao999 commented 9 years ago

They fit a quadratic function across the capture sites (organized in groups of A, B, C, D, etc.) for each of the genes independently. The fitted quadratic function is evaluated against a function assuming linear relationship between capture site and expression level for statistical significance. The p-value for each gene is computed via permutation; a p-value of .01 indicates that 1% of the 100,000 permutated samples show a statistically significant quadratic relationship between the capture sites and the expression levels.

Note that the default output setting is p-value < .01, which is 1000 out of 10,000 permutations, a not so "rare" number.

Take-home notes here:

  1. OEFinder looks for genes that have "variable" expression values across capture sites. I would expect that genes with high variance (or coefficient of variation) across capture sites are more likely to be identified as "OE" genes by OEFinder. I'll plot out variances or coefficient of variation for these genes...
  2. Need to look closely and see if any batch effects exist in the identification of OE genes. I sent a message to the OEFinder developer about the error message I got when running data by batch. Apparently I just have to ignore those messages and let the software run. I'll get on this...