smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
315 stars 31 forks source link

Using module_trait_correlation to identify gene modules associated with specific disease states. #242

Closed Li-mengjie closed 3 weeks ago

Li-mengjie commented 1 month ago

Hello, I have been learning about module_trait_correlation (https://smorabit.github.io/hdWGCNA/articles/module_trait_correlation.html), which includes two aspects of significance in the correlation heatmap. It conducts a correlation analysis between gene modules and traits based on cell types. However, I am currently interested in performing a correlation analysis between gene modules and traits based on disease stages, aiming to obtain gene modules specific to certain disease stages. Although I have a general understanding of the analysis process of hdWGCNA, I am still unable to implement my idea mentioned above.

Additionally, I have noticed that when using PlotModuleTraitCorrelation for plotting, it may require traits with 2 or more traits due to subgroup analysis reasons. In reality, I want to obtain a heatmap showing the correlation between multiple disease stages and gene modules, which can include cell types. However, because there is only one trait (cell types) in this scenario, I cannot plot the heatmap. I am not interested in other traits for this analysis.

I look forward to your response.

Li-mengjie commented 1 month ago

Hello, I have been learning about module_trait_correlation (https://smorabit.github.io/hdWGCNA/articles/module_trait_correlation.html), which includes two aspects of significance in the correlation heatmap. It conducts a correlation analysis between gene modules and traits based on cell types. However, I am currently interested in performing a correlation analysis between gene modules and traits based on disease stages, aiming to obtain gene modules specific to certain disease stages. Although I have a general understanding of the analysis process of hdWGCNA, I am still unable to implement my idea mentioned above.

Additionally, I have noticed that when using PlotModuleTraitCorrelation for plotting, it may require traits with 2 or more traits due to subgroup analysis reasons. In reality, I want to obtain a heatmap showing the correlation between multiple disease stages and gene modules, which can include cell types. However, because there is only one trait (cell types) in this scenario, I cannot plot the heatmap. I am not interested in other traits for this analysis.

I look forward to your response.

Although I implemented the correlation analysis between disease states and gene modules by setting group.by='orig.ident(disease_state:0_Day,1_Day,30_Day)' parameter, the results contain a lot of NaN and inf values. I'm not sure what went wrong, and I would greatly appreciate it if you could help me with the above issue. seurat_obj <- ModuleTraitCorrelation( seurat_obj, traits = cur_traits, group.by='orig.ident' ) 43fa0a445f5ab3ca51f2c70333ec76f

smorabit commented 1 month ago

Could you please share the code that you ran?

Li-mengjie commented 1 month ago

Could you please share the code that you ran?

Of course! Here is the code that I ran for setting the data expression in a Seurat object and grouping cells by 'orig.ident' using SCT assay with metacells enabled. For ease of viewing, I have chosen R script files. Please let me know if there are any other requirements. ran code.zip

Li-mengjie commented 1 month ago

Could you please share the code that you ran?

Of course! Here is the code that I ran for setting the data expression in a Seurat object and grouping cells by 'orig.ident' using SCT assay with metacells enabled. For ease of viewing, I have chosen R script files. Please let me know if there are any other requirements. ran code.zip

When plotting the correlation heatmap using the PlotModuleTraitCorrelation command, the result is as shown in the figure. Despite grouping by orig.ident (as indicated in the left panel), I still need to set multiple subgroups (traits) to plot correctly(right panel); otherwise, it will result in the following error. Additionally, as mentioned earlier, the results contain a large number of NaN and inf values. 9af12cc38fe1a3477c85f764e0d2f40 293b7ef84150b6301eadcddd75c70dc 95d05e731b08303ff0a6ea3c445a79b

Li-mengjie commented 1 month ago

Could you please share the code that you ran?

Of course! Here is the code that I ran for setting the data expression in a Seurat object and grouping cells by 'orig.ident' using SCT assay with metacells enabled. For ease of viewing, I have chosen R script files. Please let me know if there are any other requirements. ran code.zip

When plotting the correlation heatmap using the PlotModuleTraitCorrelation command, the result is as shown in the figure. Despite grouping by orig.ident (as indicated in the left panel), I still need to set multiple subgroups (traits) to plot correctly(right panel); otherwise, it will result in the following error. Additionally, as mentioned earlier, the results contain a large number of NaN and inf values. 9af12cc38fe1a3477c85f764e0d2f40 293b7ef84150b6301eadcddd75c70dc 95d05e731b08303ff0a6ea3c445a79b

This is my single cell data structure, which I hope will be helpful in understanding my results. 794a506969e2663fc14fca295396578

smorabit commented 1 month ago

Thanks for sharing these details. One small request, instead of sharing a zip file can you please include the code directly in this GitHub issue?

I can tell that you have at least one problem. It does not make sense to use celltype or orig.ident as your traits, since those are categorical variables. Think about traits like age, sex, genotype, disease status. In your case you should use the disease status as one of the traits.

Li-mengjie commented 1 month ago

age, sex, genotype, disease status

Of course, I'm more than willing to share my code directly, but due to its length, I chose to skip the process and only show the code during execution. I agree with your point that using 'celltype' and 'orig.ident' as traits is meaningless. However, as shown in the previous images, my Seurat object does not contain information about age, sex, genotype, etc. The 'disease status' is consistent with 'orig.ident,' so there is no additional column for disease status. Now, what I want to obtain are gene modules related to the disease status (i.e., 'orig.ident'). Do you mean that I need to add an additional column for disease status? In that case, I just need to assign 'orig.ident' to it. However, after trying both methods, I didn't see any difference in results.

image image image
smorabit commented 1 month ago

my Seurat object does not contain information about age, sex, genotype, etc.

To clarify, these are just examples, I you do not have to use those specific traits for this analysis.

The 'disease status' is consistent with 'orig.ident,' so there is no additional column for disease status. Now, what I want to obtain are gene modules related to the disease status (i.e., 'orig.ident').

When running ModuleTraitCorrelation, set group.by to something like cluster or cell type, and set traits to your traits of interest.

Li-mengjie commented 1 month ago

When running ModuleTraitCorrelation, set group.by to something like cluster or cell type, and set traits to your traits of interest.

As mentioned above, when setting group.by to something like cluster or cell type, and setting traits to your traits of interest, I cannot obtain gene modules related to each time point. Instead, I get gene modules related to the overall time level (as shown in the figure below), which is not what I need. This is because the effects of these time points on cells are bidirectional rather than progressively worsening. a914eaf53cbc62ebbca087665670f59

smorabit commented 1 month ago

It seems that module trait correlation might not what you are looking for to answer your particular question. I recommend trying differential module eigengene analysis instead.