myles-lewis / locuszoomr

A pure R implementation of locuszoom for plotting genetic data at genomic loci accompanied by gene annotations.
GNU General Public License v3.0
18 stars 5 forks source link

Add "highlight" to `scatter_plot()` #19

Closed lcpilling closed 6 months ago

lcpilling commented 6 months ago

Hi Myles,

Really like the package. Use it often. I made a small change for a project where I wanted to "highlight" specific variants but not label them. This PR adds 1 option "highlight" to scatter_plot() - the listed variants inherit the shape and colour of the index variant. Useful for highlighting specific variants if one does not want the "label"

To make it work I had to slightly change how the bg colour is provided but overall this is a very small change.

It would be used like:

library(locuszoomr)
data(SLE_gwas_sub)

library(EnsDb.Hsapiens.v75)
loc <- locus(gene = 'UBE2L3', SLE_gwas_sub, flank = 1e5, ens_db = "EnsDb.Hsapiens.v75")

highlight_rsids = c("rs3747093","rs4820091", "rs5754508", "rs112504638", "rs112504638", "rs1647705", "rs34043275", "rs762349")

# default colours
locus_plot(loc, highlight=highlight_rsids)

# personal preference scheme (defaults not ideal for highlighting as the purple gets lost in the blue)
locus_plot(loc, highlight=highlight_rsids, scheme=c("grey50","grey90","red"))

Default colours: image

Custom scheme: image

Feel free to disregard if you don't want to implement but thought I would share it in case useful.

All the best, Luke

myles-lewis commented 6 months ago

Hi Luke, This is an interesting suggestion. What I might do instead is allow index_snp to be a vector not just a single value. That would replace the highlight argument. It's easy to code and requires very little modification. Good point about the default colours when there's no LD. Bw, Myles

lcpilling commented 6 months ago

That would be great. Probably a better solution.

RE the colours. My intention was to highlight independent 'lead' SNPs based on LD, but having identified this through another source. For many loci now there are several conditionally independent variants and highlighting these is desirable, but means that including the LD info by colour scheme is a challenge!

All the best, Luke

myles-lewis commented 6 months ago

You might have noticed it buried in the code, but there is the capability for users to completely manually override the colours, symbols and size of every single point by directly manipulating the dataframe loc$data and adding columns bg, pch, cex, col. scatter_plot() pulls any of these columns if they are present and they override default plotting schemes. I hadn't really advertised this in the package as it was baked in for possible future use. But what do you think? Is it useful? It would need extra documentation in scatter_plot/ locus_plot and the vignette to explain it.

lcpilling commented 6 months ago

Thanks for pointing me towards that. I had noticed that parts of the code referred so these coloumns that mostly did not exist but had not really thought about the implications. That's a great feature to have already built in.

I'm going to close this PR because I am continuing to fiddle with my fork to work for the plot I need right now, but sounds like you will take the main version in another direction. I look forward to using it :)