smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
346 stars 34 forks source link

How to build WGCNA for different regions of the brain across status #265

Open pariaaliour opened 4 months ago

pariaaliour commented 4 months ago

Dear smorabit team, First, thank you for the incredibly helpful package.

In my single-cell analysis, I am exploring the effect of disease in different regions of the brain and spinal cord (I have four different regions). In my DESeq analysis, I measured the effect of disease in each region and how the effects in regions 2, 3, and 4 differ from region 1.

Using hdWGCNA, I would like to perform DME analysis, enrichment analysis, and module-trait correlation for each region and cell type separately. However, I am a bit lost on how to proceed with the analysis. When I perform consensus analysis, I cannot generate the downstream plots for each region.

Additionally, I am wondering if, to store the WGCNA results in one Seurat object, I should perform the consensus network analysis for each cell type separately before doing the downstream analysis and visualization.

Do you have any recommendations for a better approach to the analyses I mentioned?

Many thanks, Paria

smorabit commented 4 months ago

Hi, thanks for your interest in hdWGCNA.

When I perform consensus analysis, I cannot generate the downstream plots for each region.

It sounds like you wanted to perform network analysis separately for each region, so in this case I would not recommend consensus network analysis. Maybe try performing the analysis separately for each region.

Additionally, I am wondering if, to store the WGCNA results in one Seurat object, I should perform the consensus network analysis for each cell type separately before doing the downstream analysis and visualization.

You can either store everything in one Seurat object or you can subset each cell type before starting hdWGCNA.

pariaaliour commented 4 months ago

Thanks for your reply. Yes, I would like to do the mentioned plot for each region. but at the same time I would like to be able to compare the modules between regions. How it would be possible? Also, when I separate the number of cells might not be enough to continue the analysis. Thanks, Paria

smorabit commented 4 months ago

Yes, I would like to do the mentioned plot for each region. but at the same time I would like to be able to compare the modules between regions. How it would be possible

If I understand correctly, you want to create a co-expression network and identify modules separately for each of your brain regions and each of your cell types, and then you want to perform some comparisons of these modules. In the basic hdWGCNA tutorial, we performed a network analysis on one cell type (inhibitory neurons), but as you can see the module eigengenes (expression level) are actually computed in all the cell types, not just the inhibitory neurons. This is possible due to the SetDatExpr function, which defines the expression matrix that will be used for network analysis, as you can see here:

seurat_obj <- SetDatExpr(
  seurat_obj,
  group_name = "INH", 
  group.by='cell_type'
)

To make this work for your case, when you run SetDatExpr you will want to select a region + cell type combination in SetDatExpr, and it sounds like you will either need to run a loop to go over all the different combinations, or you will need to submit some batch jobs if you have access to a compute cluster.

However, if you only want to compare regions to each other, and you don't want to compare cell types to each other, then in that case I would suggest subsetting your object to just contain all the cells for a particular cell type before you run any hdWGCNA functions.

Overall, with this approach, you will be able to perform the downstream comparisons that you want, including DMEs, module-trait correlation, etc.

Also, when I separate the number of cells might not be enough to continue the analysis.

This can be a problem. I don't know the exact minimum number of cells that is suitable to use hdWGCNA, but with fewer cells your results may not be rigorous or reproducible. I suggest not to run hdWGCNA on a cell type with fewer than 1,000, but of course ultimately it is up to you.

pariaaliour commented 4 months ago

Thank you so much for clarification!

we performed a network analysis on one cell type (inhibitory neurons), but as you can see the module eigengenes (expression level) are actually computed in all the cell types, not just the inhibitory neurons

I would like to know if I do the same steps (performing a network analysis on one cell type but module eigengenes in all the cell types) for different cell types and wanted to choose one with as many unique module as possible how to find out which cell types has more unique modules.

Moreover, I was wondering if I don't have harmony reduced dimension which alternative is better to use, map or pca?

# construct meta cells
seurat_obj <- MetacellsByGroups( seurat_obj = seurat_obj, group.by = c("cell_type", "Sample"), 
k = 25, max_shared = 10, ident.group = 'cell_type')

Regarding ModuleEigengenes function, should I set group.by.vars="sample_id" if I have already subsetted the dataset and have not performed integration in subsetted dataset? Should I regress out the same variables that I used in the DEG analysis with the vars.to.regress argument in ModuleEigengenes (not ScaleData)? Does the below order make sense?

seurat_obj <- subset(seurat_integrated, subset = region == "med")

seurat_obj <- SetupForWGCNA(
  seurat_obj,
  gene_select = "fraction",
  fraction = 0.05,
  wgcna_name = cluster
)

seurat_obj <- MetacellsByGroups(
  seurat_obj = seurat_obj,
  group.by = c("cluster_id", "sample_id"),
  reduction = 'pca',
  k = 30,
  max_shared = 15,
  ident.group = "cluster_id",
  min_cells = 95
)

print("normalize metacell expression matrix")
seurat_obj <- NormalizeMetacells(seurat_obj)

seurat_obj <- SetDatExpr(
  seurat_obj,
  group_name = 'OPC1',
  group.by= 'cluster_id',
  assay = 'RNA',
  slot = 'data',
  wgcna_name = cluster
)

print("Select soft-power threshold")
# Test different soft powers:
seurat_obj <- TestSoftPowers(
  seurat_obj,
  networkType = "signed"
)

seurat_obj <- ConstructNetwork(
  seurat_obj,
  tom_name = 'OPC1',
  overwrite_tom = TRUE
)

seurat_obj <- FindVariableFeatures(seurat_obj)
seurat_obj <- ScaleData(seurat_obj, features=VariableFeatures(seurat_obj))

print("ModuleEigengenes")
# compute all MEs in the full single-cell dataset (med)
seurat_obj <- ModuleEigengenes(seurat_obj, group.by.vars="sample_id", vars.to.regress=c("PMI", "batchlib", "sex"))

Your input would be invaluable to my analysis! Paria

smorabit commented 4 months ago

I would like to know if I do the same steps (performing a network analysis on one cell type but module eigengenes in all the cell types) for different cell types and wanted to choose one with as many unique module as possible how to find out which cell types has more unique modules.

You can find out the number of modules like this:

modules <- GetModules(seurat_obj, wgcna_name = cluster) 
n_modules <- length(unique(modules$module)) - 1 # subtract 1 because we don't care about the grey module

Regarding ModuleEigengenes function, should I set group.by.vars="sample_id" if I have already subsetted the dataset and have not performed integration in subsetted dataset? Should I regress out the same variables that I used in the DEG analysis with the vars.to.regress argument in ModuleEigengenes (not ScaleData)?

Both of these are up to your preference and will change the resulting ME values. using group.by.vars is using harmony to correct the ME values after they are computed, whereas vars.to.regress is regressing those variables from the scaled expression matrix before MEs are computed. If you did not use any integration then you likely won't need these options.

Does the below order make sense?

Looks good to me.

pariaaliour commented 4 months ago

Thanks a ton! If it seems okay to you, why do I need to do the below code if it doe snot use HVGs as you mentioned #269?

seurat_obj <- FindVariableFeatures(seurat_obj) seurat_obj <- ScaleData(seurat_obj, features=VariableFeatures(seurat_obj))

and if it uses counts data I would like to regress out the uninteresting variable as I did in my DEG analysis, should I use vars.to.regress? Could it keep consistency between results?

considering that count data (uncorrected) is used (did not use harmony before) do you recommend using group.by.vars. = "sample_id" to keep the same consistency with DEG results?

Thanks for your patient! Paria

smorabit commented 4 months ago

If it seems okay to you, why do I need to do the below code if it doe snot use HVGs as you mentioned https://github.com/smorabit/hdWGCNA/issues/269?

I think I need some more context to understand what's going on, can you tell me what happens when you don't run these? In the tutorial we mentioned that Harmony throws an error if the scaled data is missing.

and if it uses counts data I would like to regress out the uninteresting variable as I did in my DEG analysis, should I use vars.to.regress? Could it keep consistency between results?

considering that count data (uncorrected) is used (did not use harmony before) do you recommend using group.by.vars. = "sample_id" to keep the same consistency with DEG results?

For both of these points, this is up to your preference, since you know your own data better than I do I think it's best to make the decision yourself since I would just be speculating.

pariaaliour commented 4 months ago

Sorry if I was not clear. I asked if it uses HVGs, and it is important to me if it does so. The reason is that I subsetted my dataset and if I recalculate HVGs it is not corrected for uninteresting variables. And when I asked should I do regress out those variables I meant if it helps to make it more consistent with my DEG results (as I regressed out PMI, batch, sex there). Thank you! Paria

smorabit commented 4 months ago

I asked if it uses HVGs, and it is important to me if it does so.

HVGs are not used for module eigengenes.

And when I asked should I do regress out those variables I meant if it helps to make it more consistent with my DEG results (as I regressed out PMI, batch, sex there).

If you prefer to be consistent with your other analysis then I think it makes sense but ultimately it's up to you since you know your data better than I do.