Closed sr320 closed 9 months ago
I have reformatted data to what I think is necessary for new WGCNA and maybe ASFC???
data is in https://github.com/sr320/ceabigr/tree/main/output/72-exon-data-rfmt
with each sex separate in the following format
"SampleID","GeneID","fold1","fold2","fold3","fold4","fold5","fold6","Sex"
"S12M","LOC111099029",0,1.02165124753198,1.75401914124521,0.200670695462151,0,0,"M"
"S12M","LOC111099033",0,0.461345566502621,0.0339015516756814,0.129211731480006,-1.57553636075842,0.18805223150294,"M"
"S12M","LOC111099035",0,-0.385662480811985,-0.510825623765991,-0.328504066972036,-0.0408219945202552,-0.328504066972036,"M"
"S12M","LOC111099036",0,0.167054084663166,1.00330210886378,0.374693449441411,2.32060359849672,0.82098055206983,"M"
"S12M","LOC111099040",0,-1.74894556364451,-1.17029734353976,-1.52697228747849,-2.30564164147659,-1.68935346644226,"M"
"S12M","LOC111099041",0,0.653926467406664,0.120627987788615,1.24171313230878,0.22825865198098,0.510825623765991,"M"
"S12M","LOC111099043",0,0.43208264910812,0.111985917972058,-0.710640338007463,0.0804872509126865,0.47072591168956,"M"
"S12M","LOC111099045",0,-0.693147180559945,0.318453731118535,-0.287682072451781,0.661398482245365,0.362905493689368,"M"
"S12M","LOC111099047",0,-0.177334015282916,0.750826292146623,0.300104592450338,0.388657989791783,0.421994410059375,"M"
@yaaminiv @AHuffmyer do let me know if I got it wrong or other formats should be generated.
need to change format
x gene01 gene02 gene03 sample_fold1 sample_fold2 x x sample2_fold
reformatted in dir https://github.com/sr320/ceabigr/blob/main/output/72-exon-data-rfmt/
with file suffix tf
@AHuffmyer when you get a chance you want to throw this at WGCNA and see if it works?
Yes! I can do this tonight or tomorrow!
I did a quick look using WGCNA and am working on data QC. One problem was that the rows "fold1" were all 0, causing an error in estimating variance. I removed those rows but it's still having a problem, which may be due to the number of NAs in the data frame. I'll look at this again tomorrow and see if I can determine what is causing the problem.
I gave it a try. It did show module patterns that make sense. But because of the way we formatted it, we are going to have a hard time pulling out genes that changed module between treatments. This is because it assigns each gene to one module and last time, we had two separate columns for the gene value in control and the value in treated. I'll have to think more about this. The WGCNA definitely struggled with this data. I think moving to an anova-type analysis will be more appropriate.
seems non-viable
x exon1 exon2 exon3 exon4 exon5 exon6 gene1_sampleID gene1_sampleID gene1_sampleID