sr320 / ceabigr

Workshop on genomic data integration with a emphasis on epigenetic data (FHL 2022)
4 stars 2 forks source link

Reformat exon data for WGCNA analysis #91

Closed sr320 closed 4 months ago

sr320 commented 4 months ago

x exon1 exon2 exon3 exon4 exon5 exon6 gene1_sampleID gene1_sampleID gene1_sampleID

sr320 commented 4 months ago

I have reformatted data to what I think is necessary for new WGCNA and maybe ASFC???

data is in https://github.com/sr320/ceabigr/tree/main/output/72-exon-data-rfmt

with each sex separate in the following format

"SampleID","GeneID","fold1","fold2","fold3","fold4","fold5","fold6","Sex"
"S12M","LOC111099029",0,1.02165124753198,1.75401914124521,0.200670695462151,0,0,"M"
"S12M","LOC111099033",0,0.461345566502621,0.0339015516756814,0.129211731480006,-1.57553636075842,0.18805223150294,"M"
"S12M","LOC111099035",0,-0.385662480811985,-0.510825623765991,-0.328504066972036,-0.0408219945202552,-0.328504066972036,"M"
"S12M","LOC111099036",0,0.167054084663166,1.00330210886378,0.374693449441411,2.32060359849672,0.82098055206983,"M"
"S12M","LOC111099040",0,-1.74894556364451,-1.17029734353976,-1.52697228747849,-2.30564164147659,-1.68935346644226,"M"
"S12M","LOC111099041",0,0.653926467406664,0.120627987788615,1.24171313230878,0.22825865198098,0.510825623765991,"M"
"S12M","LOC111099043",0,0.43208264910812,0.111985917972058,-0.710640338007463,0.0804872509126865,0.47072591168956,"M"
"S12M","LOC111099045",0,-0.693147180559945,0.318453731118535,-0.287682072451781,0.661398482245365,0.362905493689368,"M"
"S12M","LOC111099047",0,-0.177334015282916,0.750826292146623,0.300104592450338,0.388657989791783,0.421994410059375,"M"

@yaaminiv @AHuffmyer do let me know if I got it wrong or other formats should be generated.

sr320 commented 4 months ago

need to change format

x gene01 gene02 gene03 sample_fold1 sample_fold2 x x sample2_fold

sr320 commented 4 months ago

reformatted in dir https://github.com/sr320/ceabigr/blob/main/output/72-exon-data-rfmt/

with file suffix tf

sr320 commented 4 months ago

@AHuffmyer when you get a chance you want to throw this at WGCNA and see if it works?

AHuffmyer commented 4 months ago

Yes! I can do this tonight or tomorrow!

AHuffmyer commented 4 months ago

I did a quick look using WGCNA and am working on data QC. One problem was that the rows "fold1" were all 0, causing an error in estimating variance. I removed those rows but it's still having a problem, which may be due to the number of NAs in the data frame. I'll look at this again tomorrow and see if I can determine what is causing the problem.

AHuffmyer commented 4 months ago

I gave it a try. It did show module patterns that make sense. But because of the way we formatted it, we are going to have a hard time pulling out genes that changed module between treatments. This is because it assigns each gene to one module and last time, we had two separate columns for the gene value in control and the value in treated. I'll have to think more about this. The WGCNA definitely struggled with this data. I think moving to an anova-type analysis will be more appropriate.

Post here: https://ahuffmyer.github.io/ASH_Putnam_Lab_Notebook/WGCNA-Attempt-2-for-ceabigr-project-exon-expression-data/

sr320 commented 4 months ago

seems non-viable