myles-lewis / locuszoomr

A pure R implementation of locuszoom for plotting genetic data at genomic loci accompanied by gene annotations.
GNU General Public License v3.0
18 stars 5 forks source link

[Feature Request] Include Recombination Rate on secondary axis #9

Closed nickhir closed 10 months ago

nickhir commented 10 months ago

Dear Myles, I really fell in love with your package during the last weeks, due to the great usability, documentation and also your support! There is just one small thing, which would make this package stand out even more. And that would be the inclusion of the recombination rate in the locus zoom plot as a secondary axis. I am imaging something similar to the code from gasscoplot2 (https://github.com/jrs95/gassocplot2/blob/master/R/figures.R).

Are there any plans to implement such a functionality in the future?

Cheers!

myles-lewis commented 10 months ago

Dear nickhir,

This is a good request and one I've thought about myself. The issue is where to source the recombination data for different builds. Once I have access to a reliable data source or API there should be a way to integrate it. I think gasscoplot2 has a copy of the data downloaded. I looked for it on UCSC and found a file here. But it's 30MB, so I can't embed it with the app as CRAN limits the app to 5 MB in size. But there might be a way to instruct users to download the file then load it into R via some kind of loader and then add its data to the plots. I'll look into it.

Bw, Myles

myles-lewis commented 10 months ago

I made a new branch called 'recombination' which has a working version for base graphics only. I current cannot get a working version for ggplot2. You could try installing the recombination branch version and trying the following:

data(SLE_gwas_sub)
loc <- locus(SLE_gwas_sub, gene = 'STAT4', flank = 1e5, LD = "r2",
             ens_db = "EnsDb.Hsapiens.v75")
loc <- link_recomb(loc, genome = "hg19")
locus_plot(loc)

Let me know if that's useful. I will continue to work on the ggplot2 version, but ggplot2 is poorly designed for dual y axis plots especially when mixing a scatter plot with different colours with a line plot.

myles-lewis commented 10 months ago

Initial ggplot2 version working now. One issue is that it does not seem possible to place the recombination line underneath the scatter plot since whichever data is plotted first becomes the left y axis and the 2nd dataset plotted on top (currently the recombination line) becomes the right y axis.

myles-lewis commented 10 months ago

Completed addition of feature. Merged into main branch.

nickhir commented 10 months ago

Thank you so much. Its really amazing how active and responsive you are! Plotting the recombination rate works very well for the majority of my loci, however for some I am running into the following error message:

Error in `$<-.data.frame`(`*tmp*`, "recomb", value = c(0.0772111, 0.0502790399176955,  : 
  replacement has 1292 rows, data has 1298

Here is the code to reproduce the issue.

example <- fread("example_data.tsv", data.table = FALSE)
loc_test <- locus(
    data = as.data.frame(example),
    chrom = "Chr",
    pos = "bp",
    p = "pC",
    labs = "SNP",
    seqname = chr,
    flank = 0,
    xrange = c(start, stop),
    ens_db = "EnsDb.Hsapiens.v75",
    LD = "1_247603584_C_T"
)
loc_test <- link_recomb(loc_test,
    genome = "hg19"
)
gg_scatter(loc_test)

You can download the data here: https://wetransfer.com/downloads/483e776884adec579e301cc84ee9400b20231213171653/0d379c

I am using the latest locuszoomR version.

Any help is as always much appreciated!

myles-lewis commented 10 months ago

Thanks for finding this. It was a bug in the merging of the recombination data with the SNP data which is only necessary in gg_scatter. Now fixed - should work now. Let me know how you get on.

nickhir commented 10 months ago

Great, it works now! I just noticed one tiny inconsistency. After running link_recomb, the "genome wide significance" dashed line disappears. Again, this only happens for gg_scatter, scatter_plot still shows the dashed line.

EDIT: I have opened a small pull request to fix this (#11). I think you might have just forgotten to copy a few lines of code?