greenelab / generic-expression-patterns

Distinguishing between generic and experiment-specific gene expression signals.
BSD 3-Clause "New" or "Revised" License
12 stars 6 forks source link

RNA-seq only generic genes #76

Open ajlee21 opened 3 years ago

ajlee21 commented 3 years ago

When we compared the correlation between gene percentiles generated by SOPHIE versus the manually curated dataset here, we noticed that there was a group of genes that SOPHIE identified as generic but were not found to be generic using the manually curated dataset. In this case, SOPHIE was trained on recount2 (RNA-seq) dataset while the manually curated dataset was using array platform.

See https://github.com/greenelab/generic-expression-patterns/pull/75 for details:

Why is this compression not seen in the array data?

Possible solutions to consider:

ajlee21 commented 3 years ago

Some analyses were performed here: https://github.com/greenelab/generic-expression-patterns/tree/master/explore_RNAseq_only_generic_genes

It looks like the VAE is artificially boosting lowly expressed genes in RNA-seq data, which allows them to be detected as DE. We think this VAE boosting isn't seen as much in array data due to the lower variance of array data compared to RNA-seq. Further test would need to be performed to examine the effect of different data types: array vs RNA-seq