Unable to replicate modules across different datasets

harrsha4 commented 2 years ago

Hello, I am trying to replicate Megena modules across different datasets from the same tissue and condition combination. Here is some info and about the datasets and megena results.

1) Sample size for dataset 1 is 128 while sample size for dataset 2 is 66. 2) Dataset 1 is microarray based while dataset 2 is RNA-seq based 3) I ran modulePreservation to test for preseveration for modules generated by megena for datasets. The preservation was around 79% of modules.

While the preservation results are promising, the modules don't overlap well (FET q < 0.05) and the number of enriched modules for DEGs of our condition on interest is drastically different between datasets (112/208 for dataset 1 and 61/218 for dataset 2 using a consensus DEG list). I have a couple of questions as a result.

1) Is there a minimum sample size for MEGENA and would it be expected that datasets of differing size would generate different module hierarchies? 2) Is there an issue comparing modules created from RNA-seq and microarray data? 3) What criteria would you use to consider MEGENA modules preserved using the modulePreservation function. The documentation described Z.summary > 10 as strongly preserved yet we have a very low amount of modules that meet that criteria (15 out of 208 in one dataset).

Thanks

harrsha4 commented 2 years ago

Just to clarify the modulePreservation function is from the WGCNA package. We did not parallelize the function.

songw01 commented 2 years ago

Hi there,

Let me break down the answer for you:

Module preservation analysis in WGCNA:

You first need to understand the module preservation as defined in WGCNA package. It evaluates to see if some gene pairs have significant correlations in another data set. This means, even if you only have a subset of gene pairs are significantly correlated, they will show as preserved in the pipeline. In this sense, the word, 'preservation', is over-rated (I think), but in reality, you are testing if any of your gene pairs still show significant correlations in another data set. In another project, this is indeed the pattern I observe, and this will soon be submitted for publication this year.

Having said that, you would expect small overlaps with module preservation workflow in WGCNA, but it has stuck in the bioinformatics community.

Sample size and platform (microarray vs RNA-seq issue): This will definitely affect the network inference, where the microarray is noisier and you would expect the gene gene pairs are less stable (or statistically confident) for microarray.
Number of enriched modules for DEGs: Again, the number of modules in MEGENA can be deceptive, since many modules can belong to few parent modules. The sunburst plot is very useful to visually inspect the enrichment patterns in the module hierarchy for this purpose. Also, check number of genes covered by the DEG-enriched modules in both sets, than number of enriched modules. This makes more sense.
For Z score threshold in module preservation call, yes, I would recommend using Z.summary > 10. Again, microarray is noisier, and you have varying cohorts between the RNA-seq and microarray data sets. it makes biologically more sense to focus on truly preserved modules with stringent threshold, given that module preservation analysis can fall to many pitfalls.

songw01 commented 2 years ago

I will close this thread, as this is more about module preservation analysis in WGCNA.

harrsha4 commented 2 years ago

Thank you so much for the advice. Just to clarify, is there a minimum sample size for running MEGENA?

songw01 commented 2 years ago

We recommend 20.

Regards,

On Fri, May 13, 2022 at 3:08 PM harrsha4 @.***> wrote:

Thank you so much for the advice. Just to clarify, is there a minimum sample size for running MEGENA?

— Reply to this email directly, view it on GitHub https://github.com/songw01/MEGENA/issues/14#issuecomment-1126363234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4LE72MT4JJGOBQ5EBUOOLVJ2SEVANCNFSM5V3WFIPQ . You are receiving this because you were assigned.Message ID: @.***>

-- Won-Min Song

songw01 / MEGENA

Unable to replicate modules across different datasets #14