greenelab / tybalt

Training and evaluating a variational autoencoder for pan-cancer gene expression data
BSD 3-Clause "New" or "Revised" License
162 stars 62 forks source link

Adding Results and Visualizations of WGCNA-Based Simulation Analysis #119

Closed gwaybio closed 6 years ago

gwaybio commented 6 years ago

Adding pipeline scripts, data, figures, and interpretation for this analysis.

Most of the files included in this PR are figures, which probably don't require much review. Probably the best way to track this pull request updates is with the markdown summary file: simulation_results.md

gwaybio commented 6 years ago

pinging @huqiwen0313 for simulation results

huqiwen0313 commented 6 years ago

Very cool result ! Tybalt performs well to capture the signal in module 3 and is resistant to noise. I agree with you, the performance of algorithm is kind of dependent on how the simulated data generated. If WGCNA assumes linear dependency (I guess the co-expression module is based on Pearson correlation ?), the performance of PCA/NMF/ICA will be better. This result is also expected if the signal in the module is much stronger than the background.

One minor comment, for the z-score plot in 3) it may be more easier to see and compare if take the absolute value of z-score...

gwaybio commented 6 years ago

I guess the co-expression module is based on Pearson correlation ?

That's correct - the algorithm expects module "centroids" that are then sampled from (these are the module genes). The samples are constrained to be correlated between arguments min_corr and max_corr.

One minor comment, for the z-score plot in 3) it may be more easier to see and compare if take the absolute value of z-score...

Yes, I tried this and the plots were a bit cleaner. But I do think it is an important point that the result of the subtraction in Tybalt models always has a positive z-score. In other algorithms, it is less consistent (particularly ICA).