Try models using selected sets of genes

hammer commented 7 years ago

Last @jburos 12/12 status report issue. More detail please!

jburos commented 7 years ago

The idea here is to apply the same hierarchical method we are currently testing -- except instead of using 100, 500 or 1000 randomly selected genes we instead use the gene subsets previously identified by (e.g.) CIBERSORT, and the selected gene sets identified by MCP-Counter/deconRNAseq and other tools.

For the simplest case of apples-to-apples comparison, it would be useful to separate out the method from the gene-sets as components contributing to effectiveness.

My sense is that our model will benefit from having a subset of "background" or housekeeping genes to help normalize expression levels across samples, but we may want to select this subset differently from the informative subsets previously identified.

maximz commented 7 years ago

I see significant differences in my out of sample results when I use differing sized gene sets. The composition of the gene set makes a clear difference.

I am next trying a 50-50 balance of: a) known marker genes; b) random sample of all other genes for background.

@jburos suggests further tests:

different proportions & also selecting on expression level (ie mixture of high vs 0-valued-expression mixtures) IE addressing two questions: (1) is 50/50 the right ratio, and (2) are these gene sets informative or is it similarly effective to oversample transcripts with non-zero expression?

hammerlab / immune-infiltrate-explorations

Try models using selected sets of genes #16