greenelab / BioBombe

BioBombe: Sequentially compressed gene expression features enhances biological signatures
https://greenelab.github.io/BioBombe/
BSD 3-Clause "New" or "Revised" License
64 stars 25 forks source link

Updating TCGA Figure - Adjusted weight absolute sum #176

Closed gwaybio closed 5 years ago

gwaybio commented 5 years ago

closes #171

I also combine a couple panels that were easier to describe with one reference than two.

Updated Figure

tcga_biobombe_main_figure

gwaybio commented 5 years ago

(A,B) Why is the real signal not at 100%

There are some incorrect predictions. The "real" refers to predictions made with uncompressed RNAseq data

(C) How do you interpret the y-axis -- you have the change in AUC?

This is the delta AUPR between real and permuted prediction performance

(D) Is this showing that a higher AUC (performance) tends to be associated with higher sparsity of features?

It is more of a summary/survey. Larger models perform better and have similar sparsity, DAE models are particularly sparse for some reason, so are the ensembles, etc.

(E) There is a solid and dashed line but the dashed line isn't included in the legend. I assumed this is your gene expression permuted set

Solid line is hypothetical guess in a standard ROC plot

(F) So it looks like the VAE is weighted most heavily in the ensemble except for k=45. Did you look at the performance of the ensemble removing the VAE model? How is this ensemble making predictions?

Yeah that is an interesting question - I did not do that analysis. Our main point is that signal is used across latent dimensionalities and across algorithms to make the predictions. Our argument is that to construct biologically useful representations, it is best to compress data using a bunch of algorithms into a bunch of dimensions. While it would be cool to track ensemble performance without the VAE, it wouldn't necessarily answer the questions we were interested in answering.