smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
327 stars 33 forks source link

How to end up having less modules and more specific ones #275

Closed pariaaliour closed 1 month ago

pariaaliour commented 1 month ago

Dear @smorabit, Thanks a lot for all your help with my questions regarding hdWGCNA.

I was wondering if you have any advice on how to get fewer, but more specific modules.

Many thanks, Paria

smorabit commented 1 month ago

Hi Paria,

Happy to provide some advice but first can you please clarify what you mean by more specific? Like specific to a cell type or cluster?

pariaaliour commented 1 month ago

When performing enrichment analysis, I would like to see more relevant GO terms. I understand that sometimes unrelated terms might appear.

Additionally, when I use PlotModuleTraitCorrelation and generate a heatmap showing the DME effect sizes, I notice that modules are highly correlated with uninteresting variables like nCount_RNA and nFeature_RNA. I also observe that the DME effect size is high for most modules and cell types. I need to see more specific correlation.

I hope this clarifies my concerns.

Thank you, Paria

pariaaliour commented 1 month ago

Hello again, Here is my code to give you a better idea of what I am doing:

seurat_obj <- subset(seurat_integrated, subset = region == ocu)

cluster="Mic1"

seurat_obj <- SetupForWGCNA(
  seurat_obj,
  gene_select = "fraction",
  fraction = 0.05,
  wgcna_name = cluster
)

seurat_obj <- MetacellsByGroups(
  seurat_obj = seurat_obj,
  group.by = c("cluster_id", "sample_id"),
  reduction = 'pca',
  k = 30,
  max_shared = 15,
  ident.group = "cluster_id",
  min_cells = 100
# not sure if pca is okay to put. I did not do harmony
seurat_obj <- NormalizeMetacells(seurat_obj)

print("Set up the expression matrix")
# Set up the expression matrix
seurat_obj <- SetDatExpr(
  seurat_obj,
  group_name = cluster,
  group.by= 'cluster_id',
  assay = 'RNA',
  slot = 'data',
  wgcna_name = cluster
)

seurat_obj <- TestSoftPowers(
  seurat_obj,
  networkType = "signed"
)

seurat_obj <- ConstructNetwork(
  seurat_obj,
  tom_name = cluster
)

seurat_obj <- ModuleEigengenes(seurat_obj, vars.to.regress=c("PMI", "batchlib", "sex"))

Thanks, Paria

smorabit commented 1 month ago

When performing enrichment analysis, I would like to see more relevant GO terms. I understand that sometimes unrelated terms might appear.

I think it is very common to see "irrelevant" GO terms appear. I would suggest applying a more strict filter by the p-val or effect size for the GO term results.

Additionally, when I use PlotModuleTraitCorrelation and generate a heatmap showing the DME effect sizes, I notice that modules are highly correlated with uninteresting variables like nCount_RNA and nFeature_RNA.

This is not surprising and is almost always the case with hdWGCNA. In general we expect that the eigengenes will be higher if there's higher expression (nCount) in a given cell. I suggest only performing module-trait correlation analysis with traits that are biologically interesting for you rathter than the cell-level QC stats.

I also observe that the DME effect size is high for most modules and cell types.

I am not sure why this is the case, as it would be dataset-dependent and I don't know about your specific data.

I was wondering if you have any advice on how to get fewer, but more specific modules.

In general I think these two points are directly opposing one another. If you want fewer modules, you are most likely going to achieve that by grouping more genes together in the same module, so they will be less specific and more general. The tree-cut parameters will influence the way that genes are grouped into modules, specifically detectCutHeight and mergeCutHeight. Changing these parameters will give you either fewer modules with more genes per modules, or more modules fewer genes per module. In the latter case I would call those modules more specific, but this doesn't match with what you want because there are more of them.

If you want to identify fewer modules that are more specific, maybe try using fewer genes to start with? When you run this code, the genes are being selected if they are expressed in 5% of your cells.

seurat_obj <- SetupForWGCNA(
  seurat_obj,
  gene_select = "fraction",
  fraction = 0.05,
  wgcna_name = cluster
)

You can use the argument features to pass a custom list of genes. For example, maybe you're only interested in marker genes, or only interested in genes involved in cerrtain pathways.

At this time I do not have a clear answer to your question, aside from these recommendations, since hdWGCNA is a data-driven method this is going to be very dependent on your specific dataset.

pariaaliour commented 1 month ago

Perfect, thanks for the advice. I think I want more specific modules not necessarily less modules. Sorry if my question was contradictory. Paria