songw01 / MEGENA

Multiscale embedded gene co-expression network analysis
GNU General Public License v3.0
48 stars 16 forks source link

How to perform Cluster-Trait Association Analysis? #4

Closed zhaoliang0302 closed 3 years ago

zhaoliang0302 commented 4 years ago

Hello Mr. Song, My research topic focuses on the identification of key genes associated with a clinical trait (continuous data) in tumor. I downloaded the expression data of tumor samples from the TCGA database, divide the whole cohort into high and low-risk groups based on the clinical trait, and confirmed the prognostic value of this factor using survival analysis. I don't know if I should cluster samples in the high and low-risk groups respectively or just use the whole cohort and find the most correlated module by Pearson analysis (Cluster-Trait Association Analysis, CTA). Also, I don't know how to calculate the correlation between the clinical trait and the module. Can you give me some hints? Thanks a lot.

songw01 commented 4 years ago

Hi there,

While I can not supervise or direct you in your research, it is entirely up to you how to proceed with the survival analysis. You can go ahead and derive, for instance, differentially expressed genes between the two groups and test enrichments in modules, or perform survival analysis on module-based metrics.

Having said this, I have updated the package to include the module eigengene calculation function, which will get you the principal components of the modules. You can use them as representative features of modules and do direct statistical analysis on them. Check ?ModulePrinComps. Follow the example there.

zhaoliang0302 commented 4 years ago

@songw01 Thanks. I still running "calculate PFN", and it gets slower and slower as time goes on. It has run two days on my server with 16 cores. I will check the "ModulePrinComps" after finish this process.

zhaoliang0302 commented 4 years ago

By the way, is there any trick to speed up this calculation process? The gene expression profile contains over 500 samples and 10000+ genes.

[1] "32690 out of 37704 has been included."
[1] "Performing parallel quality checks on 32000"
[1] "Qualified edges: 45"
[1] "32733 out of 37704 has been included."
[1] "Performing parallel quality checks on 32000"

each update costs over 10min

zhaoliang0302 commented 4 years ago

I don't know what the "modules" indicate in the "ModulePrinComps" function, and I try to guess the meaning of this variable but it failed. Can you spare some time to update the vignettes?

zhaoliang0302 commented 4 years ago

This figure is pretty cool, can you tell me how to make this picture? Thanks image https://onlinelibrary.wiley.com/doi/full/10.1002/ijc.32643

songw01 commented 3 years ago

This was generated by cytoscape. I will close this now.