Closed gwaybio closed 6 years ago
also tagging @huqiwen0313 for possible thoughts on simulation evals
Thanks @jaclyn-taroni and @danich1 !
I have updated commits based on your comments. It is still not yet ready for re-review however, I believe I found a bug in the simulated data script that I will need to test when I get back. I will let you know when it's ready again!
Alright! I think i have addressed the previous error in fa194fb3baba2cd5ec856b627c927ab97ad57456 (specifically in lines 108-109). Before, I was simulated only 3 sample sets (by row). This would produce signal, but not exactly what I had intended.
Before the sample set was being used to generate the eigen_samples
matrix. Generating an eigen matrix with 5 gene modules (but only with 3 "sample_sets") resulted in this sampling:
Gene Module 1 | GM2 | GM3 | GM4 | GM5 |
---|---|---|---|---|
Sample Set A | B | C | A | B |
C | A | B | C | A |
B | C | A | B | C |
Instead, with 5 "sample_sets":
Gene Module 1 | GM2 | GM3 | GM4 | GM5 |
---|---|---|---|---|
Sample Set A | B | C | D | E |
A | B | C | D | E |
A | B | C | D | E |
Where, the rows of eigen_samples
correspond to samples and columns are gene modules.
Additionally, I added a couple figures in 828333a that describes an example of the simulated data.
@danich1 @jaclyn-taroni - Ready for re-review! Thanks! (The results of the sweep will be added in a future pull request)
@gwaygenomics is there a row and column of missing values in figures/example_simulated_data.png
or is there just something wonky with the graphics that would be remedied by changing the size of the plot?
is there a row and column of missing values in figures/example_simulated_data.png or is there just something wonky with the graphics that would be remedied by changing the size of the plot?
Just wonky 💩 - I will fix before merging
Same comment as before, looks like @danich1 agrees
I approve. Just make sure you add an exception statement in the reconstruct_group function.
My bad, I must have just missed that one. Fixed in aa754c2
Related to #103
Completely updating old method of simulating data - now using WGCNA
simulateDatExpr
I've also added four evals, which include:
noise
module ranks)dist(C, decoder(C_hat))
) where A, B, and D are mean latent space encodings of 3 groups of samples. Groups B and D lack gene module 2, Groups A and C have gene module 2.I have also added a
verbose
argument to themodels
class which will control how training metrics are output.b2f2bba partially addresses #13
This is only the framework, results to come in future PR!