Manhattan subplots with known and novel color coding for both plots

hlnicholls commented 9 months ago

Hi, thank you for developing this package, it's by far the best I've found.

I am trying to make two manhattan plots in one. Similar to the final plot in your manhattan vignette (https://totajuliusd.github.io/topr/articles/manhattan.html):

However, I am trying to make these 2 plots be color-coded by known and novel loci. So I would color code by known and novel for the 2 gwas datasets in the 2 manhattan plots then add titles per each plot to identify which plot is for which trait.

I know how to do this for one manhattan plot, is there a way to do this for 2 plots in one go in the same way with the second manhattan plot upside down in comparison to the first/top one? No worries if this is maybe going out-of-scope for the package - it is already very comprehensive!

Here is exactly what I am trying:

     # Load and prepare GWAS data for phenotype1
     file_path1 <- paste0('/Input/', phenotype1, '_assoc_regenie_allchr.txt')
     gwas1 <- fread(file_path1, select = c('CHROM', 'GENPOS', 'ALLELE0', 'ALLELE1', 'p', 'A1FREQ', 'BETA', 'SE'))
     colnames(gwas1)[c(2, 3, 4, 5, 6)] <- c('POS', 'REF', 'ALT', 'P', 'AF')
     gwas_filtered1 <- gwas1[gwas1$P < 5e-8, ]
     gwas_annotated1 <- annotate_with_nearest_gene(gwas_filtered1)
     gwas1$Gene_Symbol[gwas1$P < 5e-8] <- gwas_annotated1$Gene_Symbol
     known1 <- gwas1 %>% filter(Gene_Symbol %in% known_loci)
     novel1 <- gwas1 %>% filter(!Gene_Symbol %in% known_loci)

     # Load and prepare GWAS data for phenotype2
     file_path2 <- paste0('/Input/', phenotype2, '_assoc_regenie_allchr.txt')
     gwas2 <- fread(file_path2, select = c('CHROM', 'GENPOS', 'ALLELE0', 'ALLELE1', 'p', 'A1FREQ', 'BETA', 'SE'))
     colnames(gwas2)[c(2, 3, 4, 5, 6)] <- c('POS', 'REF', 'ALT', 'P', 'AF')
     gwas_filtered2 <- gwas2[gwas2$P < 5e-8, ]
     gwas_annotated2 <- annotate_with_nearest_gene(gwas_filtered2)
     gwas2$Gene_Symbol[gwas2$P < 5e-8] <- gwas_annotated2$Gene_Symbol
     known2 <- gwas2 %>% filter(Gene_Symbol %in% known_loci)
     novel2 <- gwas2 %>% filter(!Gene_Symbol %in% known_loci)

     # Combine data for plotting
     dat <- list(c(gwas1, known1, novel1), c(gwas2, known2, novel2))
     print('plotting...')
     png_filename <- paste0("/Plots/Manhattan/known_novel_shared_manhattan_", phenotype1,"_",phenotype2, ".png")
     png(png_filename, width = 3500, height = 2500, res = 300)
     # Plotting the Manhattan plot for the current pair
     plot(manhattan(list(c(gwas1, known1, novel1), c(gwas2, known2, novel2)), color=c("darkgrey","blue","red"), annotate = c(5e-08), region_size=100000000, ntop=1, 
                    highlight_genes = genes, highlight_genes_ypos = -0.5, angle=90, ymax=40, ymin=-30, nudge_y = 2))
     dev.off()

# or something like:

  dat <- list(gwas1, known1, novel1, gwas2, known2, novel2)
  print('plotting...')
  png_filename <- paste0("/Plots/Manhattan/mtag_single_manhattan_", phenotype, ".png")
  png(png_filename, width = 3500, height = 2500, res = 300)
  # Plotting the Manhattan plot for the current pair

  plot(manhattan(dat, color=c("darkgrey","blue","red", "darkgrey","blue","red"), annotate = c(1e-100, 5e-08, 5e-08, 1e-100, 5e-08, 5e-08), 
            even_no_chr_lightness = c(0.8,0.5,0.5), 
            legend_labels = c(paste0('Single-trait_', phenotype),  'Known CMR Loci', 'Novel', 
                              paste0('Multi-trait_', phenotype),  'Known CMR Loci', 'Novel'),
            label_color = "black", region_size=100000000, ntop=3,
            highlight_genes = genes, highlight_genes_ypos = -0.5, angle=90, ymax=40, ymin=-30, nudge_y = 2))

totajuliusd commented 9 months ago

Hi, is this what you had i mind:

plot3

If so, you can achieve this with topr's inbuilt datasets (CD_UKBB and CD_FINNGEN) by doing:

CD_UKBB_annotated <- CD_UKBB %>% filter(P<5e-08) %>%annotate_with_nearest_gene()
known_UKB <- CD_UKBB_annotated %>% filter(Gene_Symbol %in% c("C1orf141","IL23R","NOD2","NKD1","CYLD","IKZF1"))
novel_UKB <- CD_UKBB_annotated %>% filter(Gene_Symbol %in% c("JAK2","TTC33","ATG16L1"))

CD_FINNGEN_annotated <- CD_FINNGEN %>% filter(P<5e-08) %>% annotate_with_nearest_gene()
known_FG <- CD_FINNGEN_annotated %>% filter(Gene_Symbol %in% c("TNRC18","FBXL18","RNF216","WIPI2", "RNU6-215P", "IL23R","NOD2"))
novel_FG <- CD_FINNGEN_annotated %>% filter(Gene_Symbol %in% c("ADO","TTC33","NKX2-3"))

png_filename <- "plot_theme_grey.png"
png(png_filename, width = 3500, height = 2500, res = 300)

manhattan(list(CD_UKBB, known_UKB, novel_UKB, CD_FINNGEN, known_FG,novel_FG), ntop=3,
          color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
          legend_labels=c("Phenotype 1","known","novel","Phenotype 2", "known","novel"), 
          highlight_genes = c("IL23R","TTC33","NOD2"), 
          highlight_genes_ypos = -0.5,
          highlight_genes_color = "green", 
          annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
          angle=90,nudge_y=7, ymax=33, ymin=-43,
          theme_grey = T, title="Phenotype 1 and Phenotype 2")
dev.off()

It is very close to what you already had.

totajuliusd commented 9 months ago

Or without the the theme_grey:

manhattan(list(CD_UKBB, known_UKB, novel_UKB, CD_FINNGEN %>% filter(P>1e-35), known_FG,novel_FG), ntop=3,
          color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
          legend_labels=c("Phenotype 1","known","novel","Phenotype 2", "known","novel"), 
          highlight_genes = c("IL23R","TTC33","NOD2"), highlight_genes_ypos = -0.5,
          highlight_genes_color = "green", 
          annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
          angle=90,nudge_y=7, ymax=33, ymin=-43,
          even_no_chr_lightness = c(0.8,0.5,0.5,0.8,0.5,0.5),
          title="Phenotype 1 and Phenotype 2")

plot4

totajuliusd commented 9 months ago

Note that you have to use two slightly different colors of grey for Phenotype 1 and Phenotype 2 so that they will get labelled separately below the plot (If you use the same color they will share the label, like the known and novel loci do).

hlnicholls commented 9 months ago

Thank you for this! It's exactly what I want.

The only part I'm having trouble with still is that my legend label is only showing the novel one.

  manhattan(list(gwas1, known, novel, gwas2, known2, novel2), ntop=3,
            color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
            legend_labels=c(paste0('Single-trait_', phenotype),"Known Loci","Novel",
                            paste0('Multi-trait_', phenotype), "Known Loci","Novel"), 
            highlight_genes = genes, highlight_genes_ypos = -0.5,
            highlight_genes_color = "green", 
            annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
            angle=90,nudge_y=7, ymax=33, ymin=-43,
            even_no_chr_lightness = c(0.8,0.5,0.5,0.8,0.5,0.5),
            title=paste0('Single-trait and Multi-trait ', phenotype))

Is there any reason why the other labels might not get included in the legend?

This is what I get (example just filtered to chr2): example_chr2

totajuliusd commented 9 months ago

Hmmm... that is odd. When I copy your code and replace your data with test data, I get all the labels:

manhattan(list(CD_UKBB, known_UKB, novel_UKB, CD_FINNGEN, known_FG,novel_FG), ntop=3,
          color=c("#A0A0A0","blue","red","#A5A5A5","blue","red"), 
          legend_labels=c(paste0('Single-trait_'),"Known Loci","Novel",
                          paste0('Multi-trait_'), "Known Loci","Novel"), 
          highlight_genes = c("FTO","THADA"), highlight_genes_ypos = -0.5,
          highlight_genes_color = "green", 
          annotate = c(1e-50,5e-08,5e-08,1e-50,5e-08,5e-08), 
          angle=90,nudge_y=7, ymax=33, ymin=-43,
          even_no_chr_lightness = c(0.8,0.5,0.5,0.8,0.5,0.5),
          title=paste0('Single-trait and Multi-trait '))

plot4

Does this also happen to you when you use the inbuilt test data (CD_UKBB and CD_FINNGEN)?

hlnicholls commented 9 months ago

No I get your output with everything correct when I run your code with your data. I'll investigate my data further to see if I can solve it (although in terms of its format it has all the same columns with the names as your data, just a different number of rows, and I'm able to plot the data as I want using the manhattan function for different styles/formats). At any rate you have solved my main question. Hopefully I can sort the legend separately. Thank you for your help!

totajuliusd commented 9 months ago

No problem, Im glad I could help with the main question. To test your data, I would just start with 2 datasets and basic labels and distinct colors- and then add the other datasets one by one, e.g.:

manhattan(list(gwas1, known), color=c("green","blue), legend_labels=c("L1","L2"))

Please let me know if the problem persists and you cant find anything wrong in your data.

hlnicholls commented 9 months ago

I've found the source of my problem was filtering to chromosome 2 only (it takes me 30-45mins to generate one full plot, so I was using chr2 as it should've gave me everything as it has known and novel loci).

I ran on my full gwas data and the output has all labels. My filtering must've been off/mismatched in some way to impact the labelling I expect.

Thank you again for your help!

totajuliusd / topr

Manhattan subplots with known and novel color coding for both plots #50