Open hammer opened 7 years ago
The idea here is to apply the same hierarchical method we are currently testing -- except instead of using 100, 500 or 1000 randomly selected genes we instead use the gene subsets previously identified by (e.g.) CIBERSORT, and the selected gene sets identified by MCP-Counter/deconRNAseq and other tools.
For the simplest case of apples-to-apples comparison, it would be useful to separate out the method from the gene-sets as components contributing to effectiveness.
My sense is that our model will benefit from having a subset of "background" or housekeeping genes to help normalize expression levels across samples, but we may want to select this subset differently from the informative subsets previously identified.
I see significant differences in my out of sample results when I use differing sized gene sets. The composition of the gene set makes a clear difference.
I am next trying a 50-50 balance of: a) known marker genes; b) random sample of all other genes for background.
@jburos suggests further tests:
different proportions & also selecting on expression level (ie mixture of high vs 0-valued-expression mixtures) IE addressing two questions: (1) is 50/50 the right ratio, and (2) are these gene sets informative or is it similarly effective to oversample transcripts with non-zero expression?
Last @jburos 12/12 status report issue. More detail please!