kevinblighe / EnhancedVolcano

Publication-ready volcano plots with enhanced colouring and labeling
399 stars 81 forks source link

Using another continuous value (not logFC or p) to scale point size/gradient colour points #51

Closed laylagerami closed 4 years ago

laylagerami commented 4 years ago

Hello,

Really loving this package and it is helping me to make some great volcano plots!

I have a question, I wonder if this is possible..

I am producing a volcano plot where I am trying to highlight which DEGs are present in an OpenTargets disease association list. Right now I have just made it so that pointSize = 3 if the DEG is present in the list of disease-associated DEGs and 1 if not. This looks great but I want to add another layer of complexity.

As I have quite a few disease-associated DEGs, I would like to somehow scale the points by the gene's association score (0-1) which is present in another dataframe. Could either do this with size, where higher association score = larger point (with all disease-associated DEGs having a larger point than non-disease associated), or what may possibly look better is using a colour gradient - so that all disease-associated DEGs are larger than non-disease associated, and they also scale from red for low association score to black for high association score for example. Then I would have another legend labelled e.g. "disease association score" which goes from 0-1 | red-black, so one can see overall (a) which DEGs are disease-associated (b) which ones have a stronger association. Does this make sense?

I saw that there is the option colGradient but looks like it is only for p-value, and the custom colours example is for discrete sets of genes rather than a continuous value. I can try to hack the custom colours functionality to get it to do what I need but was wondering if there was already a way to do this that I'm missing.

Thanks! I hope my question made sense :D

EDIT: so far I have created a "keyvals" vector with the colours I want (gradient red - black for disease-associated DEGs based on the association score, blue for p-value, green for fc and grey for ns). Now all I need is a way to have a legend such that grey, green and blue are labelled as such as in the default legend, and a separate gradient bar for disease association score. I will update if I manage it with the required code and minimal reproducible example (actual data is confidential).

laylagerami commented 4 years ago

In the end I modified the code from EnhancedVolcano and added functionality from https://github.com/eliocamp/ggnewscale to have two different colour scales.

My code is very ugly and needs a tidying up so I won't subject you to it! But happy to share if anyone is curious. Basically I just split my toptable into two, added a column for the association score, and called geom_point() twice with a new_scale_color() in between.

If I get some time I will integrate it properly and do a pull request.

Here is a picture. You can see that the large black dots are the most interesting (they were labelled but I removed the labels for the purpose of posting it here) as they are DEGs most highly associated with the disease :thumbsup:

Screenshot 2020-07-09 at 14 29 52