stephenturner / qqman

An R package for creating Q-Q and manhattan plots from GWAS results
http://cran.r-project.org/web/packages/qqman/
GNU General Public License v3.0
156 stars 91 forks source link

Highlight multiple different vectors and colour them differently? #38

Closed evad80 closed 7 years ago

evad80 commented 8 years ago

Hi, I'm not sure if this is the right place to post this type of question.

I am a biologist. I ran an association analysis, and I got back a set of SNPs of interest (called sig_snps). However, some of these SNPs are associated with body size (called sig_snps_but_assoc_with_body_size), and I want to show the difference between the set of SNPs that are significant, and the set of SNPs that are also significant with body size (if they are also associated with body size, they are less interesting).

On my manhattan plot zoomed in one on chromosome, I want to have three sets of coloured points: the non-associated SNPs, the SNPs associated with the trait, and the SNPs associated with the trait, but also associated with body size.

I wrote this code:

library("qqman")
table <-read.table("Original.linear",header=T)
data <-as.data.frame(table)
sig_snps <-scan("SigSNPsAfterBodySize.snps","")
sig_snps_but_assoc_with_body_size <-scan("SigSNPsButAssocWithBodySize","")
pdf("Test.pdf")
manhattan(subset(data,CHR == 1),highlight = c(sig_snps_after_body_size,sig_snps_but_assoc_with_body_size), main = "Chromosome 1", cex= 2.5, ylab = expression(paste("Association with Cancer Mortality (-",log[10],"[P])")),cex.lab = 1.2, cex.axis=1.5, cex.main = 1.2,suggestiveline=F,genomewideline=F)

As you can see, to achieve this, I tried to make a vector for the two sets of SNPs that I wanted coloured (sig SNPs, and sig SNPs but associated with body size). As you can see from the output attached: Test.pdf

The command ran without error, but didn't work. So I have two questions:

  1. Is what I'm doing possible,if the SNP is in the "sig_snps" object colour it red. If the SNP is in the "sig_snps_but_assoc_with_body_size", colour it blue. Else, colour it black.
  2. I saw that I can change the source code colours here: "If you'd like to change the color of the highlight or the suggestive/genomewide lines, you'll need to modify the source code. Search for col="blue", col="red", or col="green3" to modify the suggestive line, genomewide line, and highlight colors, respectively."

My problem is: I'm not a programmer, and also I'm not sure how I would change them to include two different colours for my above problem.

If someone could help with this code I would appreciate it!

Thanks Eva

stephenturner commented 8 years ago

This is the place, and thanks for posting. I'll keep this issue open for now, but in all honesty, it won't be a feature for a while. The general approach would be to pass in two lists: one list of SNP rsid vectors, and one list of colors. The SNP vectors in the elements of the first list would be highlighted by the list of colors in list 2. That's the general strategy if you or anyone else wants to take a shot at implementing.

evad80 commented 8 years ago

Thanks for the quick reply! I would love to but unfortunately it would be over my head. But I really appreciate the response; at least now I know it's not something that I'm just not doing properly with the existing code.

evermane commented 8 years ago

Sorry if this is too late to be useful, but I wanted to do a similar thing, I think. I modified the source code so that I could highlight two sets of snps in the same manhattan plot and changed the color so that one set appeared green (highlight1) and the other (highlight2) appeared blue. The command to run this modified code is the same as before except the highlight part. For example:

manhattan(data, highlight1=set1, highlight2=set2)

Here is the modified code:

manhattan1<-function (x, chr = "CHR", bp = "BP", p = "P", snp = "SNP", col = c("gray10", "gray60"), chrlabs = NULL, suggestiveline = -log10(1e-05), genomewideline = -log10(5e-08), highlight1 = NULL, highlight2 = NULL, logp = TRUE, ...) { CHR = BP = P = index = NULL if (!(chr %in% names(x))) stop(paste("Column", chr, "not found!")) if (!(bp %in% names(x))) stop(paste("Column", bp, "not found!")) if (!(p %in% names(x))) stop(paste("Column", p, "not found!")) if (!(snp %in% names(x))) warning(paste("No SNP column found. OK unless you're trying to highlight.")) if (!is.numeric(x[[chr]])) stop(paste(chr, "column should be numeric. Do you have 'X', 'Y', 'MT', etc? If so change to numbers and try again.")) if (!is.numeric(x[[bp]])) stop(paste(bp, "column should be numeric.")) if (!is.numeric(x[[p]])) stop(paste(p, "column should be numeric.")) d = data.frame(CHR = x[[chr]], BP = x[[bp]], P = x[[p]]) if (!is.null(x[[snp]])) d = transform(d, SNP = x[[snp]]) d <- subset(d, (is.numeric(CHR) & is.numeric(BP) & is.numeric(P))) d <- d[order(d$CHR, d$BP), ] if (logp) { d$logp <- -log10(d$P) } else { d$logp <- d$P } d$pos = NA d$index = NA ind = 0 for (i in unique(d$CHR)) { ind = ind + 1 d[d$CHR == i, ]$index = ind } nchr = length(unique(d$CHR)) if (nchr == 1) { options(scipen = 999) d$pos = d$BP/1e+06 ticks = floor(length(d$pos))/2 + 1 xlabel = paste("Chromosome", unique(d$CHR), "position(Mb)") labs = ticks } else { lastbase = 0 ticks = NULL for (i in unique(d$index)) { if (i == 1) { d[d$index == i, ]$pos = d[d$index == i, ]$BP } else { lastbase = lastbase + tail(subset(d, index == i - 1)$BP, 1) d[d$index == i, ]$pos = d[d$index == i, ]$BP + lastbase } ticks = c(ticks, (min(d[d$CHR == i, ]$pos) + max(d[d$CHR == i, ]$pos))/2 + 1) } xlabel = "Chromosome" labs <- unique(d$CHR) } xmax = ceiling(max(d$pos) * 1.03) xmin = floor(max(d$pos) * -0.03) def_args <- list(xaxt = "n", bty = "n", xaxs = "i", yaxs = "i", las = 1, pch = 20, xlim = c(xmin, xmax), ylim = c(0, ceiling(max(d$logp))), xlab = xlabel, ylab = expression(-log10)) dotargs <- list(...) do.call("plot", c(NA, dotargs, def_args[!names(def_args) %in% names(dotargs)])) if (!is.null(chrlabs)) { if (is.character(chrlabs)) { if (length(chrlabs) == length(labs)) { labs <- chrlabs } else { warning("You're trying to specify chromosome labels but the number of labels != number of chromosomes.") } } else { warning("If you're trying to specify chromosome labels, chrlabs must be a character vector") } } if (nchr == 1) { axis(1, ...) } else { axis(1, at = ticks, labels = labs, ...) } col = rep(col, max(d$CHR)) if (nchr == 1) { with(d, points(pos, logp, pch = 20, col = col[1], ...)) } else { icol = 1 for (i in unique(d$index)) { with(d[d$index == unique(d$index)[i], ], points(pos, logp, col = col[icol], pch = 20, ...)) icol = icol + 1 } } if (suggestiveline) abline(h = suggestiveline, col = "blue") if (genomewideline) abline(h = genomewideline, col = "red") if (!is.null(highlight1)) { if (any(!(highlight1 %in% d$SNP))) warning("You're trying to highlight1 SNPs that don't exist in your results.") d.highlight1 = d[which(d$SNP %in% highlight1), ] with(d.highlight1, points(pos, logp, col = "green3", pch = 20, ...)) } if (!is.null(highlight2)) { if (any(!(highlight2 %in% d$SNP))) warning("You're trying to highlight2 SNPs that don't exist in your results.") d.highlight2 = d[which(d$SNP %in% highlight2), ] with(d.highlight2, points(pos, logp, col = "dodgerblue", pch = 20, ...)) } }

stephenturner commented 8 years ago

Thanks @evermane!

clagiamba commented 7 years ago

Hi,

I am trying to solve the same issue but with many different colors, I feel like the code is almost all there but I am missing something because legend and categories don't quite match...

This is the part I am adding:

##############

Highlight2: highlight snps from a second character vector

keyword2 = "group"

if (!is.null(highlight2)) { if (any(!(highlight2 %in% d$SNP))) warning("You're trying to highlight SNPs that don't exist in your results.") d.highlight2=d[which(d$SNP %in% highlight2), ] if (!is.null(x[[keyword2]])) d.highlight2=transform(d.highlight2, keyword2=x[[keyword2]])

jColors <- with(d.highlight2, data.frame(keyword2 = levels(as.factor(d.highlight2$keyword2)), highlight.col = rainbow(length(levels(as.factor(d.highlight2$keyword2)))), name = 'highlight2'))
d.highlight2$highlight.col = jColors[match(d.highlight2$keyword2, jColors$keyword2), "highlight.col"] d.highlight2$highlight.col = factor(d.highlight2$highlight.col)

with(d.highlight2, points(pos, logp, col=levels(d.highlight2$highlight.col), pch=20, ...)) 

# with(d.highlight2, points(pos, logp, col=rainbow(length(levels(as.factor(d.highlight2$keyword2)))), pch=20, ...)) 
# rainbow(length(levels(as.factor(d.highlight2$keyword2))))
legend = TRUE
# if (legend)  legend("topleft", legend=levels(as.factor(d.highlight2[,"keyword2"])), text.col=seq_along(levels(as.factor(d.highlight2[,"keyword2"]))), cex=0.5,ncol=1,horiz=F)
if (legend)  {
legend(x = 'topleft', 
   legend = levels(d.highlight2$keyword2),
   col = levels(d.highlight2$highlight.col), pch = par("pch"), bty = 'n', xjust = 1)
   }

} ##############