Module Preservation Analysis

Hi @smorabit , First, I would like to thank you very much for this extremely useful tool. I greatly appreciate the effort that you and your team invested into its development, and hopefully, it will have a profound impact on co-expression network analysis in single-cell data.

And here, I have a question regarding the module preservation analysis in this tutorial https://smorabit.github.io/hdWGCNA/articles/module_preservation.html

Before running the ModulePreservation function, you extracted both the reference and query expression data sets. However, I notice that you extract the metacell expression matrix for the reference and the single-cell expression matrix for the query. Why is this the case? and is it possible to run the analysis using the reference single-cell expression matrix as well?

Another question concerning your approach to studying changes in gene co-expression networks between two or more biological conditions (e.g. control, treatment A, treatment B, etc. or control, disease A, diseaseB, etc. ). Would you recommend constructing separate coexpression networks for each of the conditions and comparing these networks in terms of their topological characteristics (like intramodular gene connectivity, degree of overlap between modules from different networks) or would you go for constructing just one reference network with a fixed set of assigned modules and then carrying out module preservation analysis to detect preserved/non-preserved modules under different query condtions?

I know there are many different ways in which analysts address this question and I just wanted to have your general thoughts about that?

Thank you very much once again !!

Hi,

First of all thank you for the kind words and I appreciate your interest in hdWGCNA. I will go through your questions and try my best to answer them.

Before running the ModulePreservation function, you extracted both the reference and query expression data sets. However, I notice that you extract the metacell expression matrix for the reference and the single-cell expression matrix for the query. Why is this the case?

and is it possible to run the analysis using the reference single-cell expression matrix as well?

For the reference I recommend using the same expression matrix that was used for the network construction, which is most likely the metacell matrix.

Would you recommend constructing separate coexpression networks for each of the conditions and comparing these networks in terms of their topological characteristics (like intramodular gene connectivity, degree of overlap between modules from different networks) or would you go for constructing just one reference network with a fixed set of assigned modules and then carrying out module preservation analysis to detect preserved/non-preserved modules under different query condtions?

You can definitely construct separate networks for your conditions and then compare the networks. To complement that kind of analysis, you can also perform Consensus network analysis using your different groups to see what co-expression modules and relationships are retained in both conditions. Unfortunately I don't have any examples or tutorials doing this kind of comparison so you will likely have to write some custom code, but this might be something I would add at some undetermined point in the future. You could also construct one network with both conditions and then run Differential Module Eigengene (DME) analysis, which is essentially like differential gene expression analysis for comparing two conditions. At the end of the day, there's a lot of different ways that you could use co-expression network analysis to interrogate your data in different ways and it's up and your colleagues to you to figure out what makes the most sense for your study.

Thank you @smorabit very much for your kind and thorough reply. I kindly appreciate the interesting ideas you suggested for the analysis. In fact, I tried consensus network analysis to integrate multiple patient-specific networks into a single network for one particular cell type for each biological condition. It worked well with my data specifically for large cell clusters. I also tried constructing one single network for one cell type with samples from all biological conditions and perform differential module eigengene analysis as you kindly suggested. However, rather than applying a wilcoxon test, I tried using linear mixed-effects to compare module eigengenes in order to account for inter-sample (patient) variability and variability from different cellular detection rates (fractions of genes expressed in each cell). I like the versatility of the functions hdWGCNA offer and its integration with standard Seurat functions which makes analysis straightforward and easier to perform.

I have 2 inquires: first, in the single-cell tutorial, you compute module eigengenes across all cells of all types in the entire dataset. However, when you compute eigengene-based connectivity kME for each gene, you correlate the expression of each gene in cells of only one type (inhibitory neurons in this case) to the module eigengene computed on all cells of all types. Wouldn't it be better to correlate Gene expression in one cell type to module eigengenes computed on that particular cell type (not all cells) especially if one is interested in cell type specific networks?

Second, the way the current analysis is formulated does not take into account the fact that single cells come from different samples (patients) at least in terms of calculating correlations and computing TOM matrices. I mean when it comes to network construction, all created metacells are treated as if they are independent samples when in fact they are not. Could one try to calculate gene-gene correlation coefficients for each sample individually and then weigh these correlations based on the cell count of each sample so that ultimately one gets a TOM matrix that has been weighed in a way that captures the influence that each sample has (based on its size) on the correlation and the similarity metrics?

Thank you very much and I hope my comments are not too long to go through.

Cheers, Ismail

To answer your questions:

Wouldn't it be better to correlate Gene expression in one cell type to module eigengenes computed on that particular cell type (not all cells) especially if one is interested in cell type specific networks?

This is definitely possible to do with hdWGCNA, it's up to your preference of how you wish to analyze your dataset. For example, this is what we had done in Figure 5 of the hdWGCNA paper. We designed this package to be modular and flexible so it's not realistic for me to write one tutorial encompassing all of the possible ways of analyzing a dataset.

Could one try to calculate gene-gene correlation coefficients for each sample individually and then weigh these correlations based on the cell count of each sample so that ultimately one gets a TOM matrix that has been weighed in a way that captures the influence that each sample has (based on its size) on the correlation and the similarity metrics?

I am not sure if I understand your question entirely but I believe that this is out of the scope of what hdWGCNA does at this time, you will likely have to write your own functions to perform this kind of analysis.

Also it's best to keep GitHub issues restricted to one topic so in the future please open separate issues for unrelated questions.

smorabit / hdWGCNA

Module Preservation Analysis #117