Does it make sense to compare two levels of a variable (with 4 level) in one level of another variable(with two levels) using DME

pariaaliour commented 2 months ago

Dear @smorabit, Thanks for your helpful recommendation! I am applying. hdWGCNAto my single cell dataset. I built a network for one cell type (like Microglia) while I have different variables like group_id (two levels: case & control) and region (four levels: 1, 2, 3, & 4). To compare case vs control I did the below analysis:

group1 <- seurat_obj@meta.data %>% subset(group_id == "als") %>% rownames
group2 <- seurat_obj@meta.data %>% subset(group_id != "als") %>% rownames

cur_DMEs <- FindDMEs(
   seurat_obj,
  barcodes1 = group1,
  barcodes2 = group2,
  test.use='wilcox',
  pseudocount.use=0.01, 
  wgcna_name = data_dir)

Then to compare different regions in case I was thinking of two different approach

First one is to compare 2 or 3 or 4 vs 1 in case as below:

  group1 <- seurat_obj@meta.data %>% subset(region == "2" & group_id == "als") %>% rownames
  group2 <- seurat_obj@meta.data %>% subset(region == "1" & group_id == "als") %>% rownames 

  # run the DME test
  cur_DMEs <- FindDMEs(
   seurat_obj,
  barcodes1 = group1,
  barcodes2 = group2,
  test.use='wilcox',
  pseudocount.use=0.01, 
  wgcna_name = data_dir)

Second, I make the region as a two-level variable as resistant (1: R and 2, 3, and 4: V) and then the DME analysis:

group1 <- seurat_obj@meta.data %>% subset(resistant == "V" & group_id == "als") %>% rownames
  group2 <- seurat_obj@meta.data %>% subset(resistant == "R" & group_id == "als") %>% rownames 

  # run the DME test
  cur_DMEs <- FindDMEs(
   seurat_obj,
  barcodes1 = group1,
  barcodes2 = group2,
  test.use='wilcox',
  pseudocount.use=0.01, 
  wgcna_name = data_dir)

First, I would like to know which of these two approach makes more sense if I want to compare region 1 to other regions all together or one by one comparison. Second, when I do the DME analysis for all sub-clusters (like Mic1 and Mic2 for microglia) The heat map plot shows the same effect size for all sub cluster, not sure where I am doing it wrong. It happens for all the cell types I am running the DME.

DMEs <- data.frame()

  # loop through the clusters
  for(cur_cluster in clusters){
  # identify barcodes for group1 and group2 in each cluster
  # with the below code the als as group1 and con as group2 the con is considered the reference.
    group1 <- seurat_obj@meta.data %>% subset(resistant == "V" & group_id == "als") %>% rownames
    group2 <- seurat_obj@meta.data %>% subset(resistant == "R" & group_id == "als") %>% rownames 

  # run the DME test
  cur_DMEs <- FindDMEs(
   seurat_obj,
  barcodes1 = group1,
  barcodes2 = group2,
  test.use='wilcox',
  pseudocount.use=0.01, 
  wgcna_name = data_dir)

  # add the cluster info to the table
  cur_DMEs$cluster <- cur_cluster

  # append the table
  DMEs <- rbind(DMEs, cur_DMEs)

  }

  # get the modules table:
  modules <- GetModules(seurat_obj)
  mods <- levels(modules$module) 
  mods <- mods[mods != 'grey']

  # make a copy of the DME table for plotting
  plot_df <- DMEs

  # set the factor level for the modules so they plot in the right order:
  plot_df$module <- factor(as.character(plot_df$module), levels=mods)

  # set a min/max threshold for plotting
  maxval <- 3; minval <- -3
  plot_df$avg_log2FC <- ifelse(plot_df$avg_log2FC > maxval, maxval, plot_df$avg_log2FC)
  plot_df$avg_log2FC <- ifelse(plot_df$avg_log2FC < minval, minval, plot_df$avg_log2FC)

  # add significance levels
  plot_df$Significance <- gtools::stars.pval(plot_df$p_val_adj)

  # change the text color to make it easier to see 
  plot_df$textcolor <- ifelse(plot_df$avg_log2FC > 0.2, 'white', 'black')

  # make the heatmap with geom_tile
  plot10 <- plot_df %>% 
  ggplot(aes(y=cluster, x=module, fill=avg_log2FC)) +
  geom_tile() 

  # add the significance levels
  plot10 <- plot10 + geom_text(label=plot_df$Significance, color=plot_df$textcolor) 

  # customize the color and theme of the plot
  plot10 <- plot10 + 
  scale_fill_gradient2(low='white', mid='lightblue', high='midnightblue') +
  RotatedAxis() +
  theme(
    panel.border = element_rect(fill=NA, color='black', size=1),
    axis.line.x = element_blank(),
    axis.line.y = element_blank(),
    axis.text.x = element_text(size = 8),    # Customize x-axis text size
    axis.text.y = element_text(size = 8),    # Customize y-axis text size
     legend.text = element_text(size = 8),    # Customize legend text size
    legend.title = element_text(size = 8),
    plot.margin=margin(0,0,0,0)
  ) + xlab('') + ylab('') +
  coord_equal()

  pdf(file.path(output_dir, "10_DME_effect_sizes_heatmap_VvsR_ALS.pdf"), height=5, width=10)
  print(plot10)
  dev.off()

I really appreciate your input on this. Thanks, Paria

smorabit commented 2 months ago

Hi Paria,

If I understand correctly, you have 2 conditions (disease and control) and you have 4 brain regions. One DME analysis you want to do is disease vs control within each region, and then you want to compare the regions to each other. Do I understand correctly?

First, I would like to know which of these two approach makes more sense if I want to compare region 1 to other regions all together or one by one comparison.

You are asking to do one vs. rest comparisons, or to do pairwise comparisons. These will give you different results since they are testing different things. The one vs. rest comparisons are similar to cell type marker tests that we typically do for single-cell analysis, so this is good for identifying modules that are more specifically expressed in one region. On the other hand the pairwise test would tell you which modules are expressed higher or lower in one region relative to another. Both of these approaches make sense in their own way, but you should think about the biological question you want to ask.

Second, when I do the DME analysis for all sub-clusters (like Mic1 and Mic2 for microglia) The heat map plot shows the same effect size for all sub cluster, not sure where I am doing it wrong. It happens for all the cell types I am running the DME.

This doesn't sound right, there must be something incorrect in the logic of your code. Just by looking at it I think this is the problem, when you define group1 and group2 you don't split it up by cluster. Change it to something like this:

 group1 <- seurat_obj@meta.data %>% subset(resistant == "V" & group_id == "als" & cluster == cur_cluster) %>% rownames
 group2 <- seurat_obj@meta.data %>% subset(resistant == "R" & group_id == "als" & cluster == cur_cluster) %>% rownames

pariaaliour commented 2 months ago

Thanks for your reply!

If I understand correctly, you have 2 conditions (disease and control) and you have 4 brain regions. One DME analysis you want to do is disease vs control within each region, and then you want to compare the regions to each other. Do I understand correctly?

You are correct with some modifications. I have two variables, one with 2 levels and the other with 4 levels. I would like to compare cases vs. controls across all regions to have a larger sample size, rather than doing this for each region individually. Additionally, I want to compare regions to each other within the cases, not the controls. I understand that each comparison needs to be interpreted differently. I just wanted to confirm if it makes sense to compare different levels of the region variable within one level of the group_id variable.

Regarding my second question, I got where I was wrong. Thanks a ton! Paria

smorabit commented 1 month ago

I just wanted to confirm if it makes sense to compare different levels of the region variable within one level of the group_id variable.

Makes sense to me! I have done something similar, comparing female vs male donors within my disease group. Good luck with your analysis 😄

smorabit / hdWGCNA

Does it make sense to compare two levels of a variable (with 4 level) in one level of another variable(with two levels) using DME #288