songw01 / MEGENA

Multiscale embedded gene co-expression network analysis
GNU General Public License v3.0
48 stars 16 forks source link

Multi-scale clustering control #13

Open nfancy opened 3 years ago

nfancy commented 3 years ago

Hi,

Thank you very much for this great package. Is there any control over the multi-scale clustering? For example, in my analysis, I have ended up with a couple of large clusters which I would like to split further. Is there any option within the function to control that?

Thanks in advance. Nurun

songw01 commented 3 years ago

There are several ways. Can you share MEGENA.output object via dropbox or something? I can have a look.

nfancy commented 3 years ago

Hi, Thanks for your quick response. I am attaching the link for the rds object here. My objective is to get smaller clusters because some of the clusters are really large. Thanks.

https://www.dropbox.com/s/l0cpvyzb64mjan0/MEGENA.output.rds?dl=0

nfancy commented 3 years ago

Hi,

Thanks very much for offering to look at the data. Did you manage to take a look?

songw01 commented 3 years ago

Hi nfancy,

I took a look, and it seems that the big modules are dictated by highly connected genes such as IL1RAPL1 (which has connectivity of 729). Presence of such highly connected hub, by its nature, holds the entire module so compact, thus the network itself does not have any meaningful substructure in it. And this makes sense that MEGENA is not detecting anymore substructures in them, because it should not when network connectivity is so skewed. While presence of such hub is certainly possible, I still suspect that there is a strong confounding factor that drives the overall correlation structure, and recommend for you to examine such confounding variables carefully. Below is the code I used to examine the data:

rm(list = ls())

library(MEGENA)

MEGENA.output = readRDS("MEGENA.output.rds")

Startegy #1. Look for a alpha value cutoff that cuts up the modules

alpha.val = sort(unique(MEGENA.output$module.output$module.alpha[-1])) res = data.frame() for (i in 1:length(alpha.val)) { mods = get.union.cut(module.output = MEGENA.output$module.output,alpha.cut = alpha.val[i],output.plot = F, plotfname = "validModules_alpha",module.pval = 0.05,remove.unsig = F) df = data.frame(alpha.val = rep(alpha.val[i],length(mods)),module.size = sapply(mods,length)) res = rbind.data.frame(res,df) rm(mods,df) }

library(ggplot2) library(ggbeeswarm)

pobj = ggplot(data = res,aes(x = alpha.val,y = module.size,group = alpha.val)) + geom_quasirandom() + geom_boxplot(alpha = 0.2) + scale_x_log10() + scale_y_log10() + theme_bw() + geom_vline(xintercept = 0.33,colour= "red") pobj

Strategy #2. Look for a particular module to cut up

get module hierarchy tree

summary.res = MEGENA.ModuleSummary(MEGENA.output, mod.pvalue = 0.05,hub.pvalue = 0.05, min.size = 10,max.size = 2500, annot.table = NULL,symbol.col = NULL,id.col = NULL, output.sig = TRUE)

head(summary.res$module.table)

module relation table

htbl= summary.res$module.table[c("module.parent","module.id")]

module hierarchy in igraph

hg = graph.data.frame(htbl,directed = TRUE)

quick visualization of overall module hierarchy

hobj.o = get_module_hierarchy_graph(htbl,max.depth = 5,anchor.mid = NULL,h.scale = NULL,is.circular = TRUE,layout = "dendrogram") print(hobj.o$pobj)

If you want to break up a certain module, e.g. c1_2

hobj.i = get_module_hierarchy_graph(htbl,max.depth = 5,anchor.mid = "c1_2",h.scale = NULL,is.circular = TRUE,layout = "dendrogram") print(hobj.i$pobj)

get modules under c1_13: it shows recurring presence of large modules due to highly connected genem IL1RAPL1

mids = names(ego(graph = hg,order = 1,"c1_13",mode = "out")[[1]]) summary.res$module.table[match(mids,summary.res$module.table$module.id),]

nfancy commented 3 years ago

Thank you so much for such a detailed explanattion. Do you have any recommendations on how to check the effect of any confound on the correlation?